nonlinear unsupervised feature learning how local similarities lead to global coding amirreza shaban

Nonlinear Unsupervised Feature

Learning

How Local Similarities Lead to Global Coding

Amirreza Shaban

2 DMLNonlinear Unsupervised Feature Learning

DMLDMLNonlinear Unsupervised Feature Learning

2

Outline

Feature Learning

Coding methods Vector Quantization Sparse Coding Local Coordinate Coding Locality-constrained Linear Coding Local Similarity Global Coding

Experiments



3

Feature Learning

The goal of feature learning is to convert a complex high dimensional nonlinear learning problem into a much simpler linear one.

Learned features capture the nonlinearity of the data structure in a way that the problem can be solved by a much easier linear learning method.

A topic very close to nonlinear dimensionality reduction.



4

Feature Learning



5

Coding Method

Coding methods are a class of algorithms aimed at finding high level representations of low level features.

Given unlabeled input data X= and codebook C = of m atoms, the goal is to learn the coding vector where each element indicates the affinity of data point to the corresponding codebook atom.

1 2 n{x , x ,..., x } 1 2 mb ,b ,..., b

iix



6

Vector Quantization

Assign each data point to its nearest dictionary basis:

The dictionary bases are the cluster centers that

are learned by K-means.

ij 1...1 if j = argmin || ||,

0 otherwise

j n i jx b



7

Vector Quantization

R1

R2

R3

[1, 0, 0]

[0, 1, 0]

[0, 0, 1]



8

Sparse Coding

Each data point is represented by a linear combination of a small number of codebook atoms.

The coefficients are found by solving the following minimization problem:

,arg min || || | |C i i ii

x C



9

Local Coordinate Coding

It is empirically seen that when coefficients corresponding to local bases are non-zero, sparse coding proves a better performance.

It is conclude that locality is more essential than sparsity.

2 2,arg min || || | | || ||C i i ij i j

i j

x C x c

10

DMLNonlinear Unsupervised Feature Learning


10

Local Coordinate Coding

Learning Method:

It is proved that it can learn an arbitrary function on the manifold.

Rate of convergence only depends on the intrinsic dimensionality of the manifold, not d.

,( ) ( , )i C i ij jj

f x f l x l

2,arg min ( ( , ), ) ( ( ))

jl C i i ji j

l f l x y l g c

11



11

Locality-constrained Linear Coding

LCC has high computational cost and it is not suitable for large-scale learning problems.

LLC firstly, guarantees locality by incorporating only the k-nearest bases in the coding process and secondly, minimizes the reconstruction term on the local patches:

arg min || ||i i knn ix C

12



12

Locality-constrained method drawback

Incapable of representing similarity between non-neighbor points:

( , ) ( , ) 0K x y K x z

13



13

Locality-constrained method drawback

The SVM labeling function can be written as:

For those points which SVM fails to predict the label of x.

( ) ( , )i ii

f x K x x

, ( , ) 0ii k x x

14



14

Local Similarity Global Coding

The idea is to propagate the coefficients along the data manifold:

When t = 1, is similar to recent locality-

constrained coding methods.

1

2

( , )

( , )( )

( , )

t

tt

t n

p x c

p x cx

p x c

1( )x

15



15

Inductive LSGC

The Kernel function is computed as:

It is referred to as diffusion kernel of order t.The similarity is high if x and y are connected to

each other by many paths in the graph.it is known that t controls the resolution at which

we are looking at data

The computational cost is .

High computational cost:

3( )O n

2( , ) ( ). ( )T tt tK x y x y P

16



16

Inductive LSGC

A two step process: Projection: Find vector f, in which each element

represents one step similarity between data point x and basis , i.e. .

Mapping: Propagate the one step similarities in f to the other bases by a (t-1)-step diffusion process.

ib 1 , ip x b

17



17

Inductive LSGC

The coding coefficient of data point in base is defined as:

And overall coding can be shown as:

xib

1 11

( , ) ( , ) ( , )n

t i k t k ik

q x b p x b p b b

1

2 1

( , )

( , )( )

( , )

t

t tt

t n

q x b

q x bx P f

q x b

18



18

Inductive to Transductive convergence

p and q are related by:

converges to zero at the rate of .

( , ) ( , ) ( , )t k t k kp x b q x b r x b

( , )

( , )k

k

r x b

p x b1( )O n

19



19

Experiments

20



20

Experiments

خسته نباشید

nonlinear unsupervised feature learning how local similarities lead to global coding amirreza shaban

Documents