nonlinear unsupervised feature learning how local similarities lead to global coding amirreza shaban
TRANSCRIPT
Nonlinear Unsupervised Feature
Learning
How Local Similarities Lead to Global Coding
Amirreza Shaban
2 DMLNonlinear Unsupervised Feature Learning
DMLDMLNonlinear Unsupervised Feature Learning
2
Outline
Feature Learning
Coding methods Vector Quantization Sparse Coding Local Coordinate Coding Locality-constrained Linear Coding Local Similarity Global Coding
Experiments
3 DMLNonlinear Unsupervised Feature Learning
DMLDMLNonlinear Unsupervised Feature Learning
3
Feature Learning
The goal of feature learning is to convert a complex high dimensional nonlinear learning problem into a much simpler linear one.
Learned features capture the nonlinearity of the data structure in a way that the problem can be solved by a much easier linear learning method.
A topic very close to nonlinear dimensionality reduction.
4 DMLNonlinear Unsupervised Feature Learning
DMLDMLNonlinear Unsupervised Feature Learning
4
Feature Learning
5 DMLNonlinear Unsupervised Feature Learning
DMLDMLNonlinear Unsupervised Feature Learning
5
Coding Method
Coding methods are a class of algorithms aimed at finding high level representations of low level features.
Given unlabeled input data X= and codebook C = of m atoms, the goal is to learn the coding vector where each element indicates the affinity of data point to the corresponding codebook atom.
1 2 n{x , x ,..., x } 1 2 mb ,b ,..., b
iix
6 DMLNonlinear Unsupervised Feature Learning
DMLDMLNonlinear Unsupervised Feature Learning
6
Vector Quantization
Assign each data point to its nearest dictionary basis:
The dictionary bases are the cluster centers that
are learned by K-means.
ij 1...1 if j = argmin || ||,
0 otherwise
j n i jx b
7 DMLNonlinear Unsupervised Feature Learning
DMLDMLNonlinear Unsupervised Feature Learning
7
Vector Quantization
R1
R2
R3
[1, 0, 0]
[0, 1, 0]
[0, 0, 1]
8 DMLNonlinear Unsupervised Feature Learning
DMLDMLNonlinear Unsupervised Feature Learning
8
Sparse Coding
Each data point is represented by a linear combination of a small number of codebook atoms.
The coefficients are found by solving the following minimization problem:
,arg min || || | |C i i ii
x C
9 DMLNonlinear Unsupervised Feature Learning
DMLDMLNonlinear Unsupervised Feature Learning
9
Local Coordinate Coding
It is empirically seen that when coefficients corresponding to local bases are non-zero, sparse coding proves a better performance.
It is conclude that locality is more essential than sparsity.
2 2,arg min || || | | || ||C i i ij i j
i j
x C x c
10
DMLNonlinear Unsupervised Feature Learning
DMLDMLNonlinear Unsupervised Feature Learning
10
Local Coordinate Coding
Learning Method:
It is proved that it can learn an arbitrary function on the manifold.
Rate of convergence only depends on the intrinsic dimensionality of the manifold, not d.
,( ) ( , )i C i ij jj
f x f l x l
2,arg min ( ( , ), ) ( ( ))
jl C i i ji j
l f l x y l g c
11
DMLNonlinear Unsupervised Feature Learning
DMLDMLNonlinear Unsupervised Feature Learning
11
Locality-constrained Linear Coding
LCC has high computational cost and it is not suitable for large-scale learning problems.
LLC firstly, guarantees locality by incorporating only the k-nearest bases in the coding process and secondly, minimizes the reconstruction term on the local patches:
arg min || ||i i knn ix C
12
DMLNonlinear Unsupervised Feature Learning
DMLDMLNonlinear Unsupervised Feature Learning
12
Locality-constrained method drawback
Incapable of representing similarity between non-neighbor points:
( , ) ( , ) 0K x y K x z
13
DMLNonlinear Unsupervised Feature Learning
DMLDMLNonlinear Unsupervised Feature Learning
13
Locality-constrained method drawback
The SVM labeling function can be written as:
For those points which SVM fails to predict the label of x.
( ) ( , )i ii
f x K x x
, ( , ) 0ii k x x
14
DMLNonlinear Unsupervised Feature Learning
DMLDMLNonlinear Unsupervised Feature Learning
14
Local Similarity Global Coding
The idea is to propagate the coefficients along the data manifold:
When t = 1, is similar to recent locality-
constrained coding methods.
1
2
( , )
( , )( )
( , )
t
tt
t n
p x c
p x cx
p x c
1( )x
15
DMLNonlinear Unsupervised Feature Learning
DMLDMLNonlinear Unsupervised Feature Learning
15
Inductive LSGC
The Kernel function is computed as:
It is referred to as diffusion kernel of order t.The similarity is high if x and y are connected to
each other by many paths in the graph.it is known that t controls the resolution at which
we are looking at data
The computational cost is .
High computational cost:
3( )O n
2( , ) ( ). ( )T tt tK x y x y P
16
DMLNonlinear Unsupervised Feature Learning
DMLDMLNonlinear Unsupervised Feature Learning
16
Inductive LSGC
A two step process: Projection: Find vector f, in which each element
represents one step similarity between data point x and basis , i.e. .
Mapping: Propagate the one step similarities in f to the other bases by a (t-1)-step diffusion process.
ib 1 , ip x b
17
DMLNonlinear Unsupervised Feature Learning
DMLDMLNonlinear Unsupervised Feature Learning
17
Inductive LSGC
The coding coefficient of data point in base is defined as:
And overall coding can be shown as:
xib
1 11
( , ) ( , ) ( , )n
t i k t k ik
q x b p x b p b b
1
2 1
( , )
( , )( )
( , )
t
t tt
t n
q x b
q x bx P f
q x b
18
DMLNonlinear Unsupervised Feature Learning
DMLDMLNonlinear Unsupervised Feature Learning
18
Inductive to Transductive convergence
p and q are related by:
converges to zero at the rate of .
( , ) ( , ) ( , )t k t k kp x b q x b r x b
( , )
( , )k
k
r x b
p x b1( )O n
19
DMLNonlinear Unsupervised Feature Learning
DMLDMLNonlinear Unsupervised Feature Learning
19
Experiments
20
DMLNonlinear Unsupervised Feature Learning
DMLDMLNonlinear Unsupervised Feature Learning
20
Experiments
خسته نباشید