on laplacian eigenmaps for dimensionality reduction · on laplacian eigenmaps for dimensionality...

On Laplacian Eigenmaps for DimensionalityReduction

Dr. Juan Orduz

PyData Berlin 2018

Overview

Introduction

Warming UpThe Spectral Theorem

MotivationToy Model Example

The AlgorithmDescriptionJustification

Examples: Scikit-Learn

Spectral Geometry*The LaplacianThe Heat Kernel

Can One Hear the Shape of a Drum?[Kac66]

A differentiable manifold is a type of manifold that is locallysimilar enough to a linear space to allow one to do calculus. A(Riemannian) metric g allow us to measure distances.

U ⊂ Rn

We can consider the Laplacian L : C∞(M) −→ C∞(M) and itsspectrum spec(L) = {λ0, λ1, · · · , λk , · · · −→ ∞}.I If we are given spec(L) we can infer the dimension of M, its

volume and its total scalar curvature.



U ⊂ Rn

We can consider the Laplacian L : C∞(M) −→ C∞(M) and itsspectrum spec(L) = {λ0, λ1, · · · , λk , · · · −→ ∞}.

I If we are given spec(L) we can infer the dimension of M, itsvolume and its total scalar curvature.



U ⊂ Rn

We can consider the Laplacian L : C∞(M) −→ C∞(M) and itsspectrum spec(L) = {λ0, λ1, · · · , λk , · · · −→ ∞}.I If we are given spec(L) we can infer the dimension of M, its

volume and its total scalar curvature.

Spectral Geometry for Dimensionality Reduction?

Let us assume we have data points x1, · · · , xk ∈ RN which lieon an unknown submanifold M ⊂ RN .

Key Observation

I Eigenfunctions of L on M can be used to define lowerdimensional embeddings.

Idea ([BN03])

I Model M by constructing a graph G = (V ,E) where closedata points are connected by edges.

I Construct the graph Laplacian L on G.I Compute spec(L) and the corresponding eigenfunctions.I Use these eigenfunctions to construct an embedding

F : V −→ Rm for m < N.



Key Observation


Idea ([BN03])


I Construct the graph Laplacian L on G.

I Compute spec(L) and the corresponding eigenfunctions.I Use these eigenfunctions to construct an embedding




Key Observation


Idea ([BN03])


I Construct the graph Laplacian L on G.I Compute spec(L) and the corresponding eigenfunctions.

I Use these eigenfunctions to construct an embeddingF : V −→ Rm for m < N.



Key Observation


Idea ([BN03])


I Construct the graph Laplacian L on G.I Compute spec(L) and the corresponding eigenfunctions.I Use these eigenfunctions to construct an embedding


The Spectral Theorem

Let A ∈ Mn×n(R) be a symmetric matrix, i.e. A = A†.

RecallI λ ∈ C is an eigenvalue for A with eigenvector f ∈ Rn,

f 6= 0, ifAf = λf .

I A set of vectors B = {f1, f2, · · · , fn} is a basis for Rn if:I They are linearly independent.I They generate Rn.

I B is said to be an orthonormal basis if 〈fi , fj〉 = δij .

Spectral TheoremThere exists an orthonormal basis of Rn consisting ofeigenvectors of A. Each eigenvalue is real.

Min(Max)imizing Properties of EigenvaluesLet A ∈ Mn(R) be a symmetric matrix with spectraldecomposition λ0 ≤ λ1 ≤ · · · ≤ λn.

For later purposes, we would like to find

argmax||f ||=1

〈Af , f 〉.

I Define the associated Lagrange optimization problem

L(f , λ) = 〈Af , f 〉 − λ(||f ||2 − 1).

I Take the derivative with respect to f∂

∂fL(f , λ) = 2(Af − λf ) !

= 0.

I Hence,

argmax||f ||=1

〈Af , f 〉 = fn and argmin||f ||=1

〈Af , f 〉 = f0.

Step 0: Understand the Problem

Consider the problem of mapping these points to a line so thatclose points stay as together as possible.

1

2

3

4

Step 1: From Data to Adjacency Graph

I Define a distance function: first nearest neighbour.

I For each node, attach an edge for close points.

1

2

3

4

Step 1: From Data to Adjacency Graph

I Define a distance function: first nearest neighbour.I For each node, attach an edge for close points.

1

2

3

4

Step 2: Construct the Adjacency and Degree Matrices

1

2

3

4

W =

0 1 1 11 0 0 01 0 0 01 0 0 0

D =

3 0 0 00 1 0 00 0 1 00 0 0 1

Step 3: Spectrum of the Graph LaplacianI Construct the operator L defined by

L := D −W =

3 −1 −1 −1−1 1 0 0−1 0 1 0−1 0 0 1

I Consider the generalized eigenvalue problem

Lf = λDf .

Equivalently, D−1Lf = λf .

I Eigenvalues: λ0 = 0, λ1 = 1, λ2 = 1, λ3 = 2.I An eigenvector for λ1 = 1 is y := f1 = (0,−3,1,2).I The vector y : V −→ R defines and embedding.

12 3 4


L := D −W =

3 −1 −1 −1−1 1 0 0−1 0 1 0−1 0 0 1


Lf = λDf .

Equivalently, D−1Lf = λf .I Eigenvalues: λ0 = 0, λ1 = 1, λ2 = 1, λ3 = 2.

I An eigenvector for λ1 = 1 is y := f1 = (0,−3,1,2).I The vector y : V −→ R defines and embedding.

12 3 4


L := D −W =

3 −1 −1 −1−1 1 0 0−1 0 1 0−1 0 0 1


Lf = λDf .

Equivalently, D−1Lf = λf .I Eigenvalues: λ0 = 0, λ1 = 1, λ2 = 1, λ3 = 2.I An eigenvector for λ1 = 1 is y := f1 = (0,−3,1,2).

I The vector y : V −→ R defines and embedding.

12 3 4


L := D −W =

3 −1 −1 −1−1 1 0 0−1 0 1 0−1 0 0 1


Lf = λDf .

Equivalently, D−1Lf = λf .I Eigenvalues: λ0 = 0, λ1 = 1, λ2 = 1, λ3 = 2.I An eigenvector for λ1 = 1 is y := f1 = (0,−3,1,2).I The vector y : V −→ R defines and embedding.

12 3 4

The AlgorithmLet x1, · · · , xk ∈ RN .

1. Construct a weighted graph G = (V ,E) with k nodes,one for each point, and a set of edges connectingneighbouring points. Select a distance function:I (Euclidean Distance) Let ε > 0. We connect and edge

between i and j if ||xi − xj ||2 < ε.I n nearest neighbours.

2. Choose Weights. If nodes i and j are connected, putI Wij = 1.

I (Heat Kernel) Wij := e−||xi−xj ||

2

t for some t > 0.3. Assume G is connected. Compute the eigenvalues of the

generalized eigenvector problem Lf = λDf , whereI D is the diagonal weight matrix, Dii =

∑kj=1 Wij .

I L := D −W is the graph Laplacian.4. Construct Embedding. Let f0, f1, · · · , fk−1 be the

corresponding eigenvectors ordered according to theireigenvalues (λ0 = 0). For m < N, set

F (i) := (f1(i), · · · , fm(i)).






2

t for some t > 0.

3. Assume G is connected. Compute the eigenvalues of thegeneralized eigenvector problem Lf = λDf , whereI D is the diagonal weight matrix, Dii =

∑kj=1 Wij .



F (i) := (f1(i), · · · , fm(i)).






2



∑kj=1 Wij .

I L := D −W is the graph Laplacian.

4. Construct Embedding. Let f0, f1, · · · , fk−1 be thecorresponding eigenvectors ordered according to theireigenvalues (λ0 = 0). For m < N, set

F (i) := (f1(i), · · · , fm(i)).






2



∑kj=1 Wij .



F (i) := (f1(i), · · · , fm(i)).

Why does it work?m = 1

Assume you have constructed the weighted graph G = (V ,E).We want to construct an embedding F : V −→ R.

Hint: Minimize

J(y) :=k∑

i,j=1

(yi − yj)2Wij

∗= 2y†Ly .

Thus, the problem reduces to find

argminy†Dy=1y†D1=0

y†Ly = argminy†Dy=1y†D1=0

〈Ly , y〉

I y†Dy = 1 fixes the scale.I y†D1 = 0 eliminates the trivial solution y = 1.

This translates to finding the minimum non-zero eigenvalue andeigenvector of

Ly = λDy .

Why does it work?m > 1 (Vectorize)

Assume you have constructed the weighted graph G = (V ,E).We want to construct an embedding F : V −→ Rm.

Hint: Minimize, for Y = (y1 · · · ym) ∈ Mk×m(R),

J(Y ) :=k∑

i,j=1

||Yi − Yj ||2Wij = tr(Y †LY ).

Thus, the problem reduces to find

argmintr(Y †DY=I)

tr(Y †LY )

This translates to finding the minimum non-zero eigenvaluesand eigenvectors of

Lf = λDy .

Examples: Scikit-Learn

Let us go to a Jupyter notebook to see some examples.

The LaplacianSecond order differential operator L : C∞c (M) −→ C∞c (M).I For M = Rn,

L = −n∑

i=1

∂2

∂x2i

I For (M,g) Riemannian manifold,

L = −n∑

i=1

n∑j=1

g ij ∂2

∂xi∂xj+ lower order terms.

Spectral Theorem ([Ros97])L is symmetric with respect to the inner product in C∞c (M),

(f ,g)L2 =

∫M

f (x)g(x)dx .

If M is compact, there exists an orthonormal basis of L2(M)consisting of eigenvectors of L. Each eigenvalue is real.

Embedding trough EigenmapsLet (M,g) be a compact Riemannian manifold and f : M −→ R.I If x , z ∈ M are close, then

|f (x)− f (z)| ≤ distM(x , z)||∇f ||+ o(distM(x , z)).

I We want a map that best preserves locality on average,

argmin||f ||L2(M)

=1

∫M||∇f ||2dx . (1)

I By Stokes’ Theorem∫M||∇f ||2dx =

∫M(Lf )fdx = (Lf , f )L2 .

I (1) must be an eigenvalue of the Laplacian.

The Graph Laplacian as a Differential Operator

1

2

3

4

e1

e2

e3

∇ =

−1 1 0 0−1 0 1 0−1 0 0 1

⇒ ∇†∇ =

3 −1 −1 −1−1 1 0 0−1 0 1 0−1 0 0 1

So we see,

L = ∇†∇.

The Heat KernelLet f : M −→ R. Consider the Heat Equation on M,

(∂t + L)u(x , t) = 0 with intitial condition u(x ,0) = f (x).

I The solution is given by ([Ros97])

u(x , t) =∫

MHt(x , y)f (y)dy ,

where the Heat Kernel has the form

Ht(x , y) = (4πt)−dim(M)/2e−distM (x,y)2

4t (φ(x , y) + O(t)),

for certain φ is a smooth function with φ(x , x) = 1.I It can be shown that, for x1, · · · , xk ∈ M and t > 0 small,

Lf (xi) ≈1t

f (xi)−∑

0<||xi−xj ||2<ε e−||xi−xj ||

2

4t f (xj)∑0<||xi−xj ||2<ε e−

||xi−xj ||2

4t

which justifies Wij = e−

||xi−xj ||2

4t .




u(x , t) =∫

MHt(x , y)f (y)dy ,



4t (φ(x , y) + O(t)),

for certain φ is a smooth function with φ(x , x) = 1.

I It can be shown that, for x1, · · · , xk ∈ M and t > 0 small,

Lf (xi) ≈1t

f (xi)−∑

0<||xi−xj ||2<ε e−||xi−xj ||

2

4t f (xj)∑0<||xi−xj ||2<ε e−

||xi−xj ||2

4t


||xi−xj ||2

4t .




u(x , t) =∫

MHt(x , y)f (y)dy ,



4t (φ(x , y) + O(t)),

for certain φ is a smooth function with φ(x , x) = 1.I It can be shown that, for x1, · · · , xk ∈ M and t > 0 small,

Lf (xi) ≈1t

f (xi)−∑

0<||xi−xj ||2<ε e−||xi−xj ||

2

4t f (xj)∑0<||xi−xj ||2<ε e−

||xi−xj ||2

4t


||xi−xj ||2

4t .

ReferencesSlides and notebook available at juanitorduz.github.io

Mikhail Belkin and Partha Niyogi.Laplacian eigenmaps for dimensionality reduction and datarepresentation.Neural Computation, 15(6):1373–1396, 2003.

Mark Kac.Can one hear the shape of a drum?The American Mathematical Monthly, 73(4):1–23, 1966.

Steven Rosenberg.The Laplacian on a Riemannian Manifold: An Introductionto Analysis on Manifolds.London Mathematical Society Student Texts. CambridgeUniversity Press, 1997.

on laplacian eigenmaps for dimensionality reduction · on laplacian eigenmaps for dimensionality...

Documents