advanced machine learning & perception

17
University Advanced Machine Learning & Perception Instructor: Tony Jebara

Upload: amanda

Post on 01-Feb-2016

34 views

Category:

Documents


0 download

DESCRIPTION

Advanced Machine Learning & Perception. Instructor: Tony Jebara. Topic 12. Manifold Learning (Unsupervised) Beyond Principal Components Analysis (PCA) Multidimensional Scaling (MDS) Generative Topographic Map (GTM) Locally Linear Embedding (LLE) Convex Invariance Learning (CoIL) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Advanced Machine Learning & Perception

Tony Jebara, Columbia University

Advanced Machine Learning & Perception

Instructor: Tony Jebara

Page 2: Advanced Machine Learning & Perception

Tony Jebara, Columbia University

Topic 12•Manifold Learning (Unsupervised)

•Beyond Principal Components Analysis (PCA)

•Multidimensional Scaling (MDS)

•Generative Topographic Map (GTM)

•Locally Linear Embedding (LLE)

•Convex Invariance Learning (CoIL)

•Kernel PCA (KPCA)

Page 3: Advanced Machine Learning & Perception

Tony Jebara, Columbia University

Manifolds•Data is often embedded in a lower dimensional space•Consider image of face being translated from left-to-right

•How to capture the true coordinates of the data on the manifold or embedding space and represent it compactly?•Open problem: many possible approaches…•PCA: linear manifold•MDS: get inter-point distances, find 2D data with same•LLE: mimic neighborhoods using low dimensional vectors•GTM: fit a grid of Gaussians to data via nonlinear warp•Linear after Nonlinear normalization/invariance of data•Linear in Hilbert space (Kernels)

0t

tx T x=r r

Page 4: Advanced Machine Learning & Perception

Tony Jebara, Columbia University

•If we have eigenvectors, mean and coefficients:

•Getting eigenvectors (I.e. approximating the covariance):

•Eigenvectors are orthonormal:•In coordinates of v, Gaussian is diagonal, cov = •All eigenvalues are non-negative•Higher eigenvalues are higher variance, use those first

•To compute the coefficients:

Principal Components Analysis

1

C

i ij jjx c v

=» m+å

r rr

11 12 13 1

12 22 23 1 2 3 2 1 2 3

13 23 33 2

0 0

0 0

0 0

T

T

V V

v v v v v v

S = L

é ù é ùS S S lê ú ê úê ú ê úé ù é ùé ù é ù é ù é ù é ù é ùS S S = lê ú ê úê ú ê úë û ë û ë û ë û ë û ë ûë û ë ûê ú ê úê ú ê úS S S lê ú ê úë û ë û

r r r r r r

( )Tij i jc x v= - m

r rr

Ti j ijv v = dr r

0i

l ³

1 2 3 4l ³ l ³ l ³ l ³ K

Page 5: Advanced Machine Learning & Perception

Tony Jebara, Columbia University

Multidimensional Scaling (MDS)•Idea: capture only distances between points X in original space•Construct another set of low dim or 2D Y points having same distances•A Dissimilarity d(x,y) is a function of two objects x and y such that

•A Metric also has to satisfy triangle inequality:

•Standard example: Euclidean l2 metric•Assume for N objects, we compute a dissimilarity matrix which tells us how far they are

( )

( )

( ) ( )

, 0

, 0

, ,

d x y

d x x

d x y d y x

³

=

=

( ) ( ) ( ), , ,d x z d x y d y z£ +( ) 21

2,d x y x y= -

( ),ij i j

d X XD =

Page 6: Advanced Machine Learning & Perception

Tony Jebara, Columbia University

Multidimensional Scaling•Given dissimilarity between original X points under original d() metric, find Y points with dissimilarity D under another d’() metric such that D is similar to

•Want to find Y’s that minimize some difference from D to •Eg. Least Squares Stress =

•Eg. Invariant Stress =

•Eg. Sammon Mapping =

•Eg. Strain =

( ) ( ), ' ,ij i j ij i j

d X X D d Y YD = =

( ) ( )2

1, ,

ij ijN ijStress Y Y D= - DåK

( )2iji j

Stress YInvStress

D<

( )21

ij ijijij

D - DDå

( ) ( )( )2 2 2 2 1 11TNtrace J D J D whereJ ID - D - = -r r

Some are globalSome are localGradient descent

Page 7: Advanced Machine Learning & Perception

Tony Jebara, Columbia University

•Have distances from cities to cities, these are on the surface of a sphere (Earth) in 3D space•Reconstructed 2D points on plane capture essential properties (poles?)

MDS Example 3D to 2D

Page 8: Advanced Machine Learning & Perception

Tony Jebara, Columbia University

•More elaborate example•Have correlation matrix between crimes. These are arbitrary dimensionality.•Hack: convert correlation to dissimilarity and show reconstructed Y

MDS Example Multi-D to 2D

Page 9: Advanced Machine Learning & Perception

Tony Jebara, Columbia University

•Instead of distance, look at neighborhood of each point. Preserve reconstruction of point with neighbors in low dim •Find K nearest neighbors for each point•Describe neighborhood as best weights on neighbors to reconstruct the point

•Find best vectors that still have same weights

Locally Linear Embedding

( )2

1

i i ij jj

ijj

W X W X

subjectto W i

e = -å

= "

åå

r r

( ) { } { }2

0i i ij jjY Y W Y subjectto E Y Cov Y IF = - = =å å

r r

Why?

Page 10: Advanced Machine Learning & Perception

Tony Jebara, Columbia University

Locally Linear Embedding•Finding W’s (convex combination of weights on neighbors):( ) ( ) ( )

2

i i i i i i ij jjW W where W X W X

· ·e = e e = -å å

r r

( ) ( )

( )( ) ( )( )( ) ( )

22

1

i i i ij j ij i jj j

T

ij i j ij i jj j

T

ij i j iik kjk

ij ijik jkjk j

W X W X W X X

W X X W X X

WW X X X X

WW C andrecall W

·e = - = -

= - -

= - -

= =

å å

å å

åå å

r r r r

r r r r

r r r r

( )* 12argmin 1T T

wiW w Cw w

·= - l

r

( )1 0

1

Cw

wC

- l =

æ ö÷ç =÷ç ÷çè øl

r

r

1) Take Deriv& Set to 0

2) SolveLinear system

3) Find

4) Find w

1 1

1 1

T

T

w

w

=

æ ö÷çl =÷ç ÷çè øl

r

r

Page 11: Advanced Machine Learning & Perception

Tony Jebara, Columbia University

Locally Linear Embedding•Finding Y’s (new low-D points that agree with the W’s)

•Solve for Y as the bottom d+1 eigenvectors of M•Plot the Y values

( )

( ) ( )( )( )

2

i ij ji j

T

i ij j i ik ki j k

T T T Ti i i ij j i ij jik k ik ki k j jk

Tij jjk jk kj ik kjk i

Tjjk kjk

Y Y W Y

Y W Y Y W Y

Y Y W Y Y W Y Y WW Y Y

W W WW Y Y

M Y Y subjecttoY beingwhite

F = -

= - -

= - - +

= d - + +

=

å å

å å å

å å å å

å åå

r r

r r r r

r r r r r r r r

r r

r r

Page 12: Advanced Machine Learning & Perception

Tony Jebara, Columbia University

•Original X data are raw images

•Dots are reconstructed two-dimensional Y points

LLE Examples

1

3

0

2

éùêúêúêú® êúêúêúêúêúëû

Page 13: Advanced Machine Learning & Perception

Tony Jebara, Columbia University

•Top=PCA•Bottom=LLE

LLEs

Page 14: Advanced Machine Learning & Perception

Tony Jebara, Columbia University

•A principled altenative to the Kohonen map•Forms a generative model of the manifold. Can sample it, etc.•Find a nonlinear mapping y() from a 2D grid of Gaussians.•Pick params W of mapping such that mapped Gaussians in data space maximize the likelihood of the observed data.•Have two spaces, the data space t (old notation were X’s) and the hidden latent space x (old notation were Y’s).•The mapping goes from latent space to observed space

Generative Topographic Map

( ),i it y x W»

Page 15: Advanced Machine Learning & Perception

Tony Jebara, Columbia University

•We choose our priors and conditionals for all variables of interest•Assume Gaussian noise on the y() mapping

•Assume our prior latent variables are a grid model equally spaced in latent space•Can now write out the full likelihood

GTM as a Grid of Gaussians

( ) ( )/ 2

2| , , exp ,

2 2

D

p t xW y xW tæ ö æ öb b÷ ÷ç çb = - -÷ ÷ç ç÷ ÷ç çè ø è øp

( ) ( )11

K

K kkp x x x

== d -å

( ) ( ) ( ) ( )1 1

, log | , log | , ,N N

n nn nL W p t W p t xW p xdx

= =b = b = bå å ò

Page 16: Advanced Machine Learning & Perception

Tony Jebara, Columbia University

•Integrating over delta functions makes a summation

•Note the log-sum, need to apply EM to maximize•Also, use the following parametric (linear in the basis) form of the mapping•Examples of manifolds for randomly chosen W mappings•Typically, we are given the data and want to find the maximum likelihood mapping W for it…

GTM Distribution Model

( ) ( ) ( )

( )1

11 1

, log | , ,

log | , ,

N

nn

N K

K n kn k

L W p t xW p xdx

p t x W

=

= =

b = b

= b

å òå å

( ) ( ),y xW W x= f

Page 17: Advanced Machine Learning & Perception

Tony Jebara, Columbia University

•Recover non-linear manifold by warping grid with W params•Synthetic Example: Left = Initialized Right = Converged

•Real Example: Oil Data 3-Classes Left = GTM Right = PCA

GTM Examples