on laplacian eigenmaps for dimensionality reduction · on laplacian eigenmaps for dimensionality...
TRANSCRIPT
On Laplacian Eigenmaps for DimensionalityReduction
Dr. Juan Orduz
PyData Berlin 2018
Overview
Introduction
Warming UpThe Spectral Theorem
MotivationToy Model Example
The AlgorithmDescriptionJustification
Examples: Scikit-Learn
Spectral Geometry*The LaplacianThe Heat Kernel
Can One Hear the Shape of a Drum?[Kac66]
A differentiable manifold is a type of manifold that is locallysimilar enough to a linear space to allow one to do calculus. A(Riemannian) metric g allow us to measure distances.
U ⊂ Rn
We can consider the Laplacian L : C∞(M) −→ C∞(M) and itsspectrum spec(L) = {λ0, λ1, · · · , λk , · · · −→ ∞}.I If we are given spec(L) we can infer the dimension of M, its
volume and its total scalar curvature.
Can One Hear the Shape of a Drum?[Kac66]
A differentiable manifold is a type of manifold that is locallysimilar enough to a linear space to allow one to do calculus. A(Riemannian) metric g allow us to measure distances.
U ⊂ Rn
We can consider the Laplacian L : C∞(M) −→ C∞(M) and itsspectrum spec(L) = {λ0, λ1, · · · , λk , · · · −→ ∞}.
I If we are given spec(L) we can infer the dimension of M, itsvolume and its total scalar curvature.
Can One Hear the Shape of a Drum?[Kac66]
A differentiable manifold is a type of manifold that is locallysimilar enough to a linear space to allow one to do calculus. A(Riemannian) metric g allow us to measure distances.
U ⊂ Rn
We can consider the Laplacian L : C∞(M) −→ C∞(M) and itsspectrum spec(L) = {λ0, λ1, · · · , λk , · · · −→ ∞}.I If we are given spec(L) we can infer the dimension of M, its
volume and its total scalar curvature.
Spectral Geometry for Dimensionality Reduction?
Let us assume we have data points x1, · · · , xk ∈ RN which lieon an unknown submanifold M ⊂ RN .
Key Observation
I Eigenfunctions of L on M can be used to define lowerdimensional embeddings.
Idea ([BN03])
I Model M by constructing a graph G = (V ,E) where closedata points are connected by edges.
I Construct the graph Laplacian L on G.I Compute spec(L) and the corresponding eigenfunctions.I Use these eigenfunctions to construct an embedding
F : V −→ Rm for m < N.
Spectral Geometry for Dimensionality Reduction?
Let us assume we have data points x1, · · · , xk ∈ RN which lieon an unknown submanifold M ⊂ RN .
Key Observation
I Eigenfunctions of L on M can be used to define lowerdimensional embeddings.
Idea ([BN03])
I Model M by constructing a graph G = (V ,E) where closedata points are connected by edges.
I Construct the graph Laplacian L on G.
I Compute spec(L) and the corresponding eigenfunctions.I Use these eigenfunctions to construct an embedding
F : V −→ Rm for m < N.
Spectral Geometry for Dimensionality Reduction?
Let us assume we have data points x1, · · · , xk ∈ RN which lieon an unknown submanifold M ⊂ RN .
Key Observation
I Eigenfunctions of L on M can be used to define lowerdimensional embeddings.
Idea ([BN03])
I Model M by constructing a graph G = (V ,E) where closedata points are connected by edges.
I Construct the graph Laplacian L on G.I Compute spec(L) and the corresponding eigenfunctions.
I Use these eigenfunctions to construct an embeddingF : V −→ Rm for m < N.
Spectral Geometry for Dimensionality Reduction?
Let us assume we have data points x1, · · · , xk ∈ RN which lieon an unknown submanifold M ⊂ RN .
Key Observation
I Eigenfunctions of L on M can be used to define lowerdimensional embeddings.
Idea ([BN03])
I Model M by constructing a graph G = (V ,E) where closedata points are connected by edges.
I Construct the graph Laplacian L on G.I Compute spec(L) and the corresponding eigenfunctions.I Use these eigenfunctions to construct an embedding
F : V −→ Rm for m < N.
The Spectral Theorem
Let A ∈ Mn×n(R) be a symmetric matrix, i.e. A = A†.
RecallI λ ∈ C is an eigenvalue for A with eigenvector f ∈ Rn,
f 6= 0, ifAf = λf .
I A set of vectors B = {f1, f2, · · · , fn} is a basis for Rn if:I They are linearly independent.I They generate Rn.
I B is said to be an orthonormal basis if 〈fi , fj〉 = δij .
Spectral TheoremThere exists an orthonormal basis of Rn consisting ofeigenvectors of A. Each eigenvalue is real.
The Spectral Theorem
Let A ∈ Mn×n(R) be a symmetric matrix, i.e. A = A†.
RecallI λ ∈ C is an eigenvalue for A with eigenvector f ∈ Rn,
f 6= 0, ifAf = λf .
I A set of vectors B = {f1, f2, · · · , fn} is a basis for Rn if:I They are linearly independent.I They generate Rn.
I B is said to be an orthonormal basis if 〈fi , fj〉 = δij .
Spectral TheoremThere exists an orthonormal basis of Rn consisting ofeigenvectors of A. Each eigenvalue is real.
The Spectral Theorem
Let A ∈ Mn×n(R) be a symmetric matrix, i.e. A = A†.
RecallI λ ∈ C is an eigenvalue for A with eigenvector f ∈ Rn,
f 6= 0, ifAf = λf .
I A set of vectors B = {f1, f2, · · · , fn} is a basis for Rn if:I They are linearly independent.I They generate Rn.
I B is said to be an orthonormal basis if 〈fi , fj〉 = δij .
Spectral TheoremThere exists an orthonormal basis of Rn consisting ofeigenvectors of A. Each eigenvalue is real.
Min(Max)imizing Properties of EigenvaluesLet A ∈ Mn(R) be a symmetric matrix with spectraldecomposition λ0 ≤ λ1 ≤ · · · ≤ λn.
For later purposes, we would like to find
argmax||f ||=1
〈Af , f 〉.
I Define the associated Lagrange optimization problem
L(f , λ) = 〈Af , f 〉 − λ(||f ||2 − 1).
I Take the derivative with respect to f∂
∂fL(f , λ) = 2(Af − λf ) !
= 0.
I Hence,
argmax||f ||=1
〈Af , f 〉 = fn and argmin||f ||=1
〈Af , f 〉 = f0.
Min(Max)imizing Properties of EigenvaluesLet A ∈ Mn(R) be a symmetric matrix with spectraldecomposition λ0 ≤ λ1 ≤ · · · ≤ λn.
For later purposes, we would like to find
argmax||f ||=1
〈Af , f 〉.
I Define the associated Lagrange optimization problem
L(f , λ) = 〈Af , f 〉 − λ(||f ||2 − 1).
I Take the derivative with respect to f∂
∂fL(f , λ) = 2(Af − λf ) !
= 0.
I Hence,
argmax||f ||=1
〈Af , f 〉 = fn and argmin||f ||=1
〈Af , f 〉 = f0.
Min(Max)imizing Properties of EigenvaluesLet A ∈ Mn(R) be a symmetric matrix with spectraldecomposition λ0 ≤ λ1 ≤ · · · ≤ λn.
For later purposes, we would like to find
argmax||f ||=1
〈Af , f 〉.
I Define the associated Lagrange optimization problem
L(f , λ) = 〈Af , f 〉 − λ(||f ||2 − 1).
I Take the derivative with respect to f∂
∂fL(f , λ) = 2(Af − λf ) !
= 0.
I Hence,
argmax||f ||=1
〈Af , f 〉 = fn and argmin||f ||=1
〈Af , f 〉 = f0.
Min(Max)imizing Properties of EigenvaluesLet A ∈ Mn(R) be a symmetric matrix with spectraldecomposition λ0 ≤ λ1 ≤ · · · ≤ λn.
For later purposes, we would like to find
argmax||f ||=1
〈Af , f 〉.
I Define the associated Lagrange optimization problem
L(f , λ) = 〈Af , f 〉 − λ(||f ||2 − 1).
I Take the derivative with respect to f∂
∂fL(f , λ) = 2(Af − λf ) !
= 0.
I Hence,
argmax||f ||=1
〈Af , f 〉 = fn and argmin||f ||=1
〈Af , f 〉 = f0.
Step 0: Understand the Problem
Consider the problem of mapping these points to a line so thatclose points stay as together as possible.
1
2
3
4
Step 1: From Data to Adjacency Graph
I Define a distance function: first nearest neighbour.
I For each node, attach an edge for close points.
1
2
3
4
Step 1: From Data to Adjacency Graph
I Define a distance function: first nearest neighbour.I For each node, attach an edge for close points.
1
2
3
4
Step 1: From Data to Adjacency Graph
I Define a distance function: first nearest neighbour.I For each node, attach an edge for close points.
1
2
3
4
Step 1: From Data to Adjacency Graph
I Define a distance function: first nearest neighbour.I For each node, attach an edge for close points.
1
2
3
4
Step 1: From Data to Adjacency Graph
I Define a distance function: first nearest neighbour.I For each node, attach an edge for close points.
1
2
3
4
Step 2: Construct the Adjacency and Degree Matrices
1
2
3
4
W =
0 1 1 11 0 0 01 0 0 01 0 0 0
D =
3 0 0 00 1 0 00 0 1 00 0 0 1
Step 3: Spectrum of the Graph LaplacianI Construct the operator L defined by
L := D −W =
3 −1 −1 −1−1 1 0 0−1 0 1 0−1 0 0 1
I Consider the generalized eigenvalue problem
Lf = λDf .
Equivalently, D−1Lf = λf .
I Eigenvalues: λ0 = 0, λ1 = 1, λ2 = 1, λ3 = 2.I An eigenvector for λ1 = 1 is y := f1 = (0,−3,1,2).I The vector y : V −→ R defines and embedding.
12 3 4
Step 3: Spectrum of the Graph LaplacianI Construct the operator L defined by
L := D −W =
3 −1 −1 −1−1 1 0 0−1 0 1 0−1 0 0 1
I Consider the generalized eigenvalue problem
Lf = λDf .
Equivalently, D−1Lf = λf .I Eigenvalues: λ0 = 0, λ1 = 1, λ2 = 1, λ3 = 2.
I An eigenvector for λ1 = 1 is y := f1 = (0,−3,1,2).I The vector y : V −→ R defines and embedding.
12 3 4
Step 3: Spectrum of the Graph LaplacianI Construct the operator L defined by
L := D −W =
3 −1 −1 −1−1 1 0 0−1 0 1 0−1 0 0 1
I Consider the generalized eigenvalue problem
Lf = λDf .
Equivalently, D−1Lf = λf .I Eigenvalues: λ0 = 0, λ1 = 1, λ2 = 1, λ3 = 2.I An eigenvector for λ1 = 1 is y := f1 = (0,−3,1,2).
I The vector y : V −→ R defines and embedding.
12 3 4
Step 3: Spectrum of the Graph LaplacianI Construct the operator L defined by
L := D −W =
3 −1 −1 −1−1 1 0 0−1 0 1 0−1 0 0 1
I Consider the generalized eigenvalue problem
Lf = λDf .
Equivalently, D−1Lf = λf .I Eigenvalues: λ0 = 0, λ1 = 1, λ2 = 1, λ3 = 2.I An eigenvector for λ1 = 1 is y := f1 = (0,−3,1,2).I The vector y : V −→ R defines and embedding.
12 3 4
The AlgorithmLet x1, · · · , xk ∈ RN .
1. Construct a weighted graph G = (V ,E) with k nodes,one for each point, and a set of edges connectingneighbouring points. Select a distance function:I (Euclidean Distance) Let ε > 0. We connect and edge
between i and j if ||xi − xj ||2 < ε.I n nearest neighbours.
2. Choose Weights. If nodes i and j are connected, putI Wij = 1.
I (Heat Kernel) Wij := e−||xi−xj ||
2
t for some t > 0.3. Assume G is connected. Compute the eigenvalues of the
generalized eigenvector problem Lf = λDf , whereI D is the diagonal weight matrix, Dii =
∑kj=1 Wij .
I L := D −W is the graph Laplacian.4. Construct Embedding. Let f0, f1, · · · , fk−1 be the
corresponding eigenvectors ordered according to theireigenvalues (λ0 = 0). For m < N, set
F (i) := (f1(i), · · · , fm(i)).
The AlgorithmLet x1, · · · , xk ∈ RN .
1. Construct a weighted graph G = (V ,E) with k nodes,one for each point, and a set of edges connectingneighbouring points. Select a distance function:I (Euclidean Distance) Let ε > 0. We connect and edge
between i and j if ||xi − xj ||2 < ε.I n nearest neighbours.
2. Choose Weights. If nodes i and j are connected, putI Wij = 1.
I (Heat Kernel) Wij := e−||xi−xj ||
2
t for some t > 0.
3. Assume G is connected. Compute the eigenvalues of thegeneralized eigenvector problem Lf = λDf , whereI D is the diagonal weight matrix, Dii =
∑kj=1 Wij .
I L := D −W is the graph Laplacian.4. Construct Embedding. Let f0, f1, · · · , fk−1 be the
corresponding eigenvectors ordered according to theireigenvalues (λ0 = 0). For m < N, set
F (i) := (f1(i), · · · , fm(i)).
The AlgorithmLet x1, · · · , xk ∈ RN .
1. Construct a weighted graph G = (V ,E) with k nodes,one for each point, and a set of edges connectingneighbouring points. Select a distance function:I (Euclidean Distance) Let ε > 0. We connect and edge
between i and j if ||xi − xj ||2 < ε.I n nearest neighbours.
2. Choose Weights. If nodes i and j are connected, putI Wij = 1.
I (Heat Kernel) Wij := e−||xi−xj ||
2
t for some t > 0.3. Assume G is connected. Compute the eigenvalues of the
generalized eigenvector problem Lf = λDf , whereI D is the diagonal weight matrix, Dii =
∑kj=1 Wij .
I L := D −W is the graph Laplacian.
4. Construct Embedding. Let f0, f1, · · · , fk−1 be thecorresponding eigenvectors ordered according to theireigenvalues (λ0 = 0). For m < N, set
F (i) := (f1(i), · · · , fm(i)).
The AlgorithmLet x1, · · · , xk ∈ RN .
1. Construct a weighted graph G = (V ,E) with k nodes,one for each point, and a set of edges connectingneighbouring points. Select a distance function:I (Euclidean Distance) Let ε > 0. We connect and edge
between i and j if ||xi − xj ||2 < ε.I n nearest neighbours.
2. Choose Weights. If nodes i and j are connected, putI Wij = 1.
I (Heat Kernel) Wij := e−||xi−xj ||
2
t for some t > 0.3. Assume G is connected. Compute the eigenvalues of the
generalized eigenvector problem Lf = λDf , whereI D is the diagonal weight matrix, Dii =
∑kj=1 Wij .
I L := D −W is the graph Laplacian.4. Construct Embedding. Let f0, f1, · · · , fk−1 be the
corresponding eigenvectors ordered according to theireigenvalues (λ0 = 0). For m < N, set
F (i) := (f1(i), · · · , fm(i)).
Why does it work?m = 1
Assume you have constructed the weighted graph G = (V ,E).We want to construct an embedding F : V −→ R.
Hint: Minimize
J(y) :=k∑
i,j=1
(yi − yj)2Wij
∗= 2y†Ly .
Thus, the problem reduces to find
argminy†Dy=1y†D1=0
y†Ly = argminy†Dy=1y†D1=0
〈Ly , y〉
I y†Dy = 1 fixes the scale.I y†D1 = 0 eliminates the trivial solution y = 1.
This translates to finding the minimum non-zero eigenvalue andeigenvector of
Ly = λDy .
Why does it work?m = 1
Assume you have constructed the weighted graph G = (V ,E).We want to construct an embedding F : V −→ R.
Hint: Minimize
J(y) :=k∑
i,j=1
(yi − yj)2Wij
∗= 2y†Ly .
Thus, the problem reduces to find
argminy†Dy=1y†D1=0
y†Ly = argminy†Dy=1y†D1=0
〈Ly , y〉
I y†Dy = 1 fixes the scale.I y†D1 = 0 eliminates the trivial solution y = 1.
This translates to finding the minimum non-zero eigenvalue andeigenvector of
Ly = λDy .
Why does it work?m = 1
Assume you have constructed the weighted graph G = (V ,E).We want to construct an embedding F : V −→ R.
Hint: Minimize
J(y) :=k∑
i,j=1
(yi − yj)2Wij
∗= 2y†Ly .
Thus, the problem reduces to find
argminy†Dy=1y†D1=0
y†Ly = argminy†Dy=1y†D1=0
〈Ly , y〉
I y†Dy = 1 fixes the scale.I y†D1 = 0 eliminates the trivial solution y = 1.
This translates to finding the minimum non-zero eigenvalue andeigenvector of
Ly = λDy .
Why does it work?m > 1 (Vectorize)
Assume you have constructed the weighted graph G = (V ,E).We want to construct an embedding F : V −→ Rm.
Hint: Minimize, for Y = (y1 · · · ym) ∈ Mk×m(R),
J(Y ) :=k∑
i,j=1
||Yi − Yj ||2Wij = tr(Y †LY ).
Thus, the problem reduces to find
argmintr(Y †DY=I)
tr(Y †LY )
This translates to finding the minimum non-zero eigenvaluesand eigenvectors of
Lf = λDy .
Examples: Scikit-Learn
Let us go to a Jupyter notebook to see some examples.
The LaplacianSecond order differential operator L : C∞c (M) −→ C∞c (M).I For M = Rn,
L = −n∑
i=1
∂2
∂x2i
I For (M,g) Riemannian manifold,
L = −n∑
i=1
n∑j=1
g ij ∂2
∂xi∂xj+ lower order terms.
Spectral Theorem ([Ros97])L is symmetric with respect to the inner product in C∞c (M),
(f ,g)L2 =
∫M
f (x)g(x)dx .
If M is compact, there exists an orthonormal basis of L2(M)consisting of eigenvectors of L. Each eigenvalue is real.
The LaplacianSecond order differential operator L : C∞c (M) −→ C∞c (M).I For M = Rn,
L = −n∑
i=1
∂2
∂x2i
I For (M,g) Riemannian manifold,
L = −n∑
i=1
n∑j=1
g ij ∂2
∂xi∂xj+ lower order terms.
Spectral Theorem ([Ros97])L is symmetric with respect to the inner product in C∞c (M),
(f ,g)L2 =
∫M
f (x)g(x)dx .
If M is compact, there exists an orthonormal basis of L2(M)consisting of eigenvectors of L. Each eigenvalue is real.
Embedding trough EigenmapsLet (M,g) be a compact Riemannian manifold and f : M −→ R.I If x , z ∈ M are close, then
|f (x)− f (z)| ≤ distM(x , z)||∇f ||+ o(distM(x , z)).
I We want a map that best preserves locality on average,
argmin||f ||L2(M)
=1
∫M||∇f ||2dx . (1)
I By Stokes’ Theorem∫M||∇f ||2dx =
∫M(Lf )fdx = (Lf , f )L2 .
I (1) must be an eigenvalue of the Laplacian.
Embedding trough EigenmapsLet (M,g) be a compact Riemannian manifold and f : M −→ R.I If x , z ∈ M are close, then
|f (x)− f (z)| ≤ distM(x , z)||∇f ||+ o(distM(x , z)).
I We want a map that best preserves locality on average,
argmin||f ||L2(M)
=1
∫M||∇f ||2dx . (1)
I By Stokes’ Theorem∫M||∇f ||2dx =
∫M(Lf )fdx = (Lf , f )L2 .
I (1) must be an eigenvalue of the Laplacian.
Embedding trough EigenmapsLet (M,g) be a compact Riemannian manifold and f : M −→ R.I If x , z ∈ M are close, then
|f (x)− f (z)| ≤ distM(x , z)||∇f ||+ o(distM(x , z)).
I We want a map that best preserves locality on average,
argmin||f ||L2(M)
=1
∫M||∇f ||2dx . (1)
I By Stokes’ Theorem∫M||∇f ||2dx =
∫M(Lf )fdx = (Lf , f )L2 .
I (1) must be an eigenvalue of the Laplacian.
The Graph Laplacian as a Differential Operator
1
2
3
4
e1
e2
e3
∇ =
−1 1 0 0−1 0 1 0−1 0 0 1
⇒ ∇†∇ =
3 −1 −1 −1−1 1 0 0−1 0 1 0−1 0 0 1
So we see,
L = ∇†∇.
The Heat KernelLet f : M −→ R. Consider the Heat Equation on M,
(∂t + L)u(x , t) = 0 with intitial condition u(x ,0) = f (x).
I The solution is given by ([Ros97])
u(x , t) =∫
MHt(x , y)f (y)dy ,
where the Heat Kernel has the form
Ht(x , y) = (4πt)−dim(M)/2e−distM (x,y)2
4t (φ(x , y) + O(t)),
for certain φ is a smooth function with φ(x , x) = 1.I It can be shown that, for x1, · · · , xk ∈ M and t > 0 small,
Lf (xi) ≈1t
f (xi)−∑
0<||xi−xj ||2<ε e−||xi−xj ||
2
4t f (xj)∑0<||xi−xj ||2<ε e−
||xi−xj ||2
4t
which justifies Wij = e−
||xi−xj ||2
4t .
The Heat KernelLet f : M −→ R. Consider the Heat Equation on M,
(∂t + L)u(x , t) = 0 with intitial condition u(x ,0) = f (x).
I The solution is given by ([Ros97])
u(x , t) =∫
MHt(x , y)f (y)dy ,
where the Heat Kernel has the form
Ht(x , y) = (4πt)−dim(M)/2e−distM (x,y)2
4t (φ(x , y) + O(t)),
for certain φ is a smooth function with φ(x , x) = 1.
I It can be shown that, for x1, · · · , xk ∈ M and t > 0 small,
Lf (xi) ≈1t
f (xi)−∑
0<||xi−xj ||2<ε e−||xi−xj ||
2
4t f (xj)∑0<||xi−xj ||2<ε e−
||xi−xj ||2
4t
which justifies Wij = e−
||xi−xj ||2
4t .
The Heat KernelLet f : M −→ R. Consider the Heat Equation on M,
(∂t + L)u(x , t) = 0 with intitial condition u(x ,0) = f (x).
I The solution is given by ([Ros97])
u(x , t) =∫
MHt(x , y)f (y)dy ,
where the Heat Kernel has the form
Ht(x , y) = (4πt)−dim(M)/2e−distM (x,y)2
4t (φ(x , y) + O(t)),
for certain φ is a smooth function with φ(x , x) = 1.I It can be shown that, for x1, · · · , xk ∈ M and t > 0 small,
Lf (xi) ≈1t
f (xi)−∑
0<||xi−xj ||2<ε e−||xi−xj ||
2
4t f (xj)∑0<||xi−xj ||2<ε e−
||xi−xj ||2
4t
which justifies Wij = e−
||xi−xj ||2
4t .
ReferencesSlides and notebook available at juanitorduz.github.io
Mikhail Belkin and Partha Niyogi.Laplacian eigenmaps for dimensionality reduction and datarepresentation.Neural Computation, 15(6):1373–1396, 2003.
Mark Kac.Can one hear the shape of a drum?The American Mathematical Monthly, 73(4):1–23, 1966.
Steven Rosenberg.The Laplacian on a Riemannian Manifold: An Introductionto Analysis on Manifolds.London Mathematical Society Student Texts. CambridgeUniversity Press, 1997.