a geometric perspective on machine learning 何晓飞 浙江大学计算机学院 1

47
A Geometric Perspective on Machine Learning 何何何 何何何何何何何何何 1

Upload: steven-wilken

Post on 15-Dec-2015

356 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: A Geometric Perspective on Machine Learning 何晓飞 浙江大学计算机学院 1

A Geometric Perspective on Machine Learning

何晓飞浙江大学计算机学院

1

Page 2: A Geometric Perspective on Machine Learning 何晓飞 浙江大学计算机学院 1

Machine Learning: the problem

f 何晓飞

Information(training data)

f: X→YX and Y are usually

considered as a Euclidean spaces.

2

Page 3: A Geometric Perspective on Machine Learning 何晓飞 浙江大学计算机学院 1

Manifold Learning: geometric perspective

The data space may not be a Euclidean space, but a nonlinear manifold.

☒ Euclidean distance.☒ f is defined on

Euclidean space.☒ambient dimension

☑ geodesic distance.☑ f is defined on

nonlinear manifold.☑ manifold

dimension.

instead… 3

Page 4: A Geometric Perspective on Machine Learning 何晓飞 浙江大学计算机学院 1

Manifold Learning: the challenges

The manifold is unknown! We have only samples!

How do we know M is a sphere or a torus, or else?

How to compute the distance on M?

versus

This is unknown:

This is what we have:

? ? or else…? Topology

Geometry

Functional analysis 4

Page 5: A Geometric Perspective on Machine Learning 何晓飞 浙江大学计算机学院 1

Manifold Learning: current solution

Find a Euclidean embedding, and then perform traditional learning algorithms in the Euclidean space.

5

Page 6: A Geometric Perspective on Machine Learning 何晓飞 浙江大学计算机学院 1

Simplicity

6

Page 7: A Geometric Perspective on Machine Learning 何晓飞 浙江大学计算机学院 1

Simplicity

7

Page 8: A Geometric Perspective on Machine Learning 何晓飞 浙江大学计算机学院 1

Simplicity is relative

8

Page 9: A Geometric Perspective on Machine Learning 何晓飞 浙江大学计算机学院 1

Manifold-based Dimensionality Reduction

Given high dimensional data sampled from a low dimensional manifold, how to compute a faithful embedding?

How to find the mapping function ?

How to efficiently find the projective function ?

f

ff

9

Page 10: A Geometric Perspective on Machine Learning 何晓飞 浙江大学计算机学院 1

A Good Mapping Function

If xi and xj are close to each other, we hope f(xi) and f(xj) preserve the local structure (distance, similarity …)

k-nearest neighbor graph:

Objective function: Different algorithms have different concerns

10

Page 11: A Geometric Perspective on Machine Learning 何晓飞 浙江大学计算机学院 1

Locality Preserving Projections

Principle: if xi and xj are close, then their maps yi and yj are also close.

11

Page 12: A Geometric Perspective on Machine Learning 何晓飞 浙江大学计算机学院 1

Locality Preserving Projections

Principle: if xi and xj are close, then their maps yi and yj are also close.

Mathematical formulation: minimize the integral of the gradient of f.

12

Page 13: A Geometric Perspective on Machine Learning 何晓飞 浙江大学计算机学院 1

Locality Preserving Projections

Principle: if xi and xj are close, then their maps yi and yj are also close.

Mathematical formulation: minimize the integral of the gradient of f.

Stokes’ Theorem:

13

Page 14: A Geometric Perspective on Machine Learning 何晓飞 浙江大学计算机学院 1

Locality Preserving Projections

Principle: if xi and xj are close, then their maps yi and yj are also close.

Mathematical formulation: minimize the integral of the gradient of f.

Stokes’ Theorem:

LPP finds a linear approximation to nonlinear manifold, while preserving the local geometric structure.

14

Page 15: A Geometric Perspective on Machine Learning 何晓飞 浙江大学计算机学院 1

Manifold of Face Images

Expression (Sad >>> Happy)

Pose

(Rig

ht >

>> L

eft)

15

Page 16: A Geometric Perspective on Machine Learning 何晓飞 浙江大学计算机学院 1

Manifold of Handwritten Digits

Thickness

Slan

t

16

Page 17: A Geometric Perspective on Machine Learning 何晓飞 浙江大学计算机学院 1

Learning target:

Training Examples:

Linear Regression Model

Active and Semi-Supervised Learning: A Geometric Perspective

17

Page 18: A Geometric Perspective on Machine Learning 何晓飞 浙江大学计算机学院 1

Generalization Error

Goal of Regression

Obtain a learned function that minimizes the generalization error (expected error for unseen test input points).

Maximum Likelihood Estimate

18

Page 19: A Geometric Perspective on Machine Learning 何晓飞 浙江大学计算机学院 1

Gauss-Markov Theorem

For a given x, the expected prediction error is:

19

Page 20: A Geometric Perspective on Machine Learning 何晓飞 浙江大学计算机学院 1

-4 -3 -2 -1 0 1 2 3 40

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

-4 -3 -2 -1 0 1 2 3 40

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Gauss-Markov Theorem

For a given x, the expected prediction error is:

Good! Bad!20

Page 21: A Geometric Perspective on Machine Learning 何晓飞 浙江大学计算机学院 1

Experimental Design Methods

Three most common scalar measures of the size of the parameter (w) covariance matrix:

A-optimal Design: determinant of Cov(w). D-optimal Design: trace of Cov(w). E-optimal Design: maximum eigenvalue of

Cov(w).

Disadvantage: these methods fail to take into account unmeasured (unlabeled) data points.

21

Page 22: A Geometric Perspective on Machine Learning 何晓飞 浙江大学计算机学院 1

Manifold Regularization: Semi-Supervised Setting

Measured (labeled) points: discriminant structure Unmeasured (unlabeled) points: geometrical structure

?

22

Page 23: A Geometric Perspective on Machine Learning 何晓飞 浙江大学计算机学院 1

Measured (labeled) points: discriminant structure Unmeasured (unlabeled) points: geometrical structure

?

random labeling

Manifold Regularization: Semi-Supervised Setting

23

Page 24: A Geometric Perspective on Machine Learning 何晓飞 浙江大学计算机学院 1

Measured (labeled) points: discriminant structure Unmeasured (unlabeled) points: geometrical structure

?

random labeling active learningactive learning + semi-supervsed learning

Manifold Regularization: Semi-Supervised Setting

24

Page 25: A Geometric Perspective on Machine Learning 何晓飞 浙江大学计算机学院 1

Unlabeled Data to Estimate Geometry Measured (labeled) points: discriminant structure

25

Page 26: A Geometric Perspective on Machine Learning 何晓飞 浙江大学计算机学院 1

Unlabeled Data to Estimate Geometry Measured (labeled) points: discriminant structure

Unmeasured (unlabeled) points: geometrical structure

26

Page 27: A Geometric Perspective on Machine Learning 何晓飞 浙江大学计算机学院 1

Unlabeled Data to Estimate Geometry Measured (labeled) points: discriminant structure

Unmeasured (unlabeled) points: geometrical structure

Compute nearest neighbor graph G

27

Page 28: A Geometric Perspective on Machine Learning 何晓飞 浙江大学计算机学院 1

Unlabeled Data to Estimate Geometry Measured (labeled) points: discriminant structure

Unmeasured (unlabeled) points: geometrical structure

Compute nearest neighbor graph G

28

Page 29: A Geometric Perspective on Machine Learning 何晓飞 浙江大学计算机学院 1

Unlabeled Data to Estimate Geometry Measured (labeled) points: discriminant structure

Unmeasured (unlabeled) points: geometrical structure

Compute nearest neighbor graph G

29

Page 30: A Geometric Perspective on Machine Learning 何晓飞 浙江大学计算机学院 1

Unlabeled Data to Estimate Geometry Measured (labeled) points: discriminant structure

Unmeasured (unlabeled) points: geometrical structure

Compute nearest neighbor graph G

30

Page 31: A Geometric Perspective on Machine Learning 何晓飞 浙江大学计算机学院 1

Unlabeled Data to Estimate Geometry Measured (labeled) points: discriminant structure

Unmeasured (unlabeled) points: geometrical structure

Compute nearest neighbor graph G

31

Page 32: A Geometric Perspective on Machine Learning 何晓飞 浙江大学计算机学院 1

Laplacian Regularized Least Square (Belkin and Niyogi, 2006)

Linear objective function

Solution

32

Page 33: A Geometric Perspective on Machine Learning 何晓飞 浙江大学计算机学院 1

Active Learning

How to find the most representative points on the manifold? 33

Page 34: A Geometric Perspective on Machine Learning 何晓飞 浙江大学计算机学院 1

Objective: Guide the selection of the subset of data points that gives the most amount of information.

Experimental design: select samples to label

Manifold Regularized Experimental DesignManifold Regularized Experimental Design

Share the same objective function as Laplacian Regularized Least Squares, simultaneously minimize the least square error on the measured samples and preserve the local geometrical structure of the data space.

Active Learning

34

Page 35: A Geometric Perspective on Machine Learning 何晓飞 浙江大学计算机学院 1

,

In order to make the estimator as stable as possible, the size of the covariance matrix should be as small as possible.

D-optimality: minimize the determinant of the covariance matrix

2( )Cov Iy 1 2TXLX I

1 2T TH ZZ XLX I

wˆ( )Cov w

11 2ˆ ( )T TZZ XLX I Z w y

Analysis of Bias and Variance

35

Page 36: A Geometric Perspective on Machine Learning 何晓飞 浙江大学计算机学院 1

Select the first data point such that is maximized,

Suppose k points have been selected, choose the (k+1)th point such that .

Update

Manifold Regularized Experimental Design

Where are selected from

1 1 1 1 11/H H H H H

1( ,..., )maxkZ H z z

1,..., kz z 1{ ,..., }mx x

1z 1 1 1 2T TXLX I z z

11 arg max

k

Tk k

ZH

zz z z

1 11 1 1 1 11 1 1 1

1 1

( )1

TT k k k k

k k k k k Tk k k

H HH H H

H

z z

z zz z

1 1 1 1 2T TH XLX I z z

The algorithm

36

Page 37: A Geometric Perspective on Machine Learning 何晓飞 浙江大学计算机学院 1

Consider feature space F induced by some nonlinear mapping φ, and < f(xi), f(xj) >=K(xi, xi).

K(·, ·): positive semi-definite kernel function Regression model in RKHS: Objective function in RKHS:

22 212

1 , 1

( ) ( ( ) ) ( ( ) ( ))2

k mT T T

LapRLS i i i j iji i j

J y S

ν ν z ν x ν x νF

1

( ) ,m

i i X ii

ν x α

( ) ,Ty ν x ν F

Nonlinear Generalization in RKHS

37

Page 38: A Geometric Perspective on Machine Learning 何晓飞 浙江大学计算机学院 1

Select the first data point such that is maximized,

Suppose k points have been selected, choose the (k+1)th point such that .

Update

Kernel Graph Regularized Experimental Design

where are selected from

2 11 2( ) ( )XZ ZX XX XX XXCov K K K LK K α

1( ,..., ) 1 2maxkZ XZ ZX XX XX XXK K K LK K z z

1,..., kz z 1{ ,..., }mx x

1v 1 1 1 2T

XX XX XXK LK K v v

11 arg max

k

Tk kM

vv v v

U V

1 11 1 1 11 1

1 11

Tk k k k

k k Tk k k

M MM M

M

v v

v v

1 1 1 1 2T

XX XX XXM K LK K v v

Nonlinear Generalization in RKHS

38

Page 39: A Geometric Perspective on Machine Learning 何晓飞 浙江大学计算机学院 1

A Synthetic Example

A-optimal Design Laplacian Regularized Optimal Design

39

Page 40: A Geometric Perspective on Machine Learning 何晓飞 浙江大学计算机学院 1

A Synthetic Example

A-optimal Design Laplacian Regularized Optimal Design

40

Page 41: A Geometric Perspective on Machine Learning 何晓飞 浙江大学计算机学院 1

Application to image/video compression

41

Page 42: A Geometric Perspective on Machine Learning 何晓飞 浙江大学计算机学院 1

Video compression

42

Page 43: A Geometric Perspective on Machine Learning 何晓飞 浙江大学计算机学院 1

Topology

Can we always map a manifold to a Euclidean space without changing its topology?

43

Page 44: A Geometric Perspective on Machine Learning 何晓飞 浙江大学计算机学院 1

Topology

Simplicial Complex

Homology Group

Betti Numbers Euler Characteristic

Good CoverSample Points

Homotopy

Number of components, dimension,…44

Page 45: A Geometric Perspective on Machine Learning 何晓飞 浙江大学计算机学院 1

Topology

The Euler Characteristic is a topological invariant, a number that describes one aspect of a topological space’s shape or structure.

1

-2

0 1 2

The Euler Characteristic of Euclidean space is 1!

0 0

45

Page 46: A Geometric Perspective on Machine Learning 何晓飞 浙江大学计算机学院 1

Challenges

Insufficient sample points Choose suitable radius How to identify noisy holes (user interaction?)

Noisy holehomotopy

homeomorphsim

46

Page 47: A Geometric Perspective on Machine Learning 何晓飞 浙江大学计算机学院 1

Q & A

47