invariance and stability of deep convolutional representationslcarin/liqun1.11.2019.pdf ·...

Invariance and Stability of Deep ConvolutionalRepresentations

Alberto Bietti, Julien Maira

Univ. Grenoble Alpes, Inria

Presented by Liqun Chen

Jan 11th, 2017

1

Outline

1 Introduction

2 Notation and basic mathematical tools

3 Construction of the Multilayer Convolutional Kernel Network (CKM)Patch extraction operatorKernel mapping operatorPooling operatorMultilayer construction

4 Stability to deformations

5 Link with CNN

2

Introduction

Outline

1 Introduction




5 Link with CNN

3

Introduction

Introduction

Motivation

Understanding the geometry of these functional spaces is afundamental question.

Representations that are stable to small deformations can robustmodels that may exploit these invariances complexity.

Related work

scattering transform is a recent attempt to characterize convolutionalmultilayer architectures based on wavelets.

scattering transform networks do not involve “learning”, since thefilters of the networks are pre-defined.

4

Introduction

Contribution of this work

This paper studies the translation-invariance properties of the kernelrepresentation and its stability to the action of diffeomorphisms, obtainingsimilar guarantees as the scattering transform, while preserving signalinformation.

5

Notation and basic mathematical tools

Outline

1 Introduction




5 Link with CNN

6


Notation and basic mathematical tools (I)

1 A positive definite kernel K that operates on a set X implicitly defines areproducing kernel Hilbert space (RKHS) H of functions from X to R, alongwith a mapping φ : X → H;

2 A predictive model associates to every point z in X and a label in R. Itconsists of a linear function f in H such that f(z) = 〈f, φ(z)〉H, where φ(z)is the data representation.

3 Given two points z, z′ ∈ X , Cauchy-Schwarz’s inequality allows to controlthe variation of the model f : | f(z)− f(z′) |≤‖ f ‖H‖ φ(z)− φ(z′) ‖H .If z and z′ are close to each other under RKHS norm, the model should output

similar predictions, when the model f has reasonably small norm in H

7


Notation and basic mathematical tools (II)

1 a signal x is a function in L2(Ω,H), where Ω is a subset of Rdrepresenting spatial coordinates

2 Given a linear operator T : L2(Ω,H)→ L2(Ω,H′), the operator normis defined as ‖ T ‖L2(Ω,H)→L2(Ω,H′):= sup‖x‖L2(Ω,H)≤1 ‖ Tx ‖L2(Ω,H′)

3 For simplicity, | · | is the Euclidean norm on Rd, ‖ · ‖ is the Hilbertspace norm.

8

Construction of the Multilayer Convolutional Kernel Network(CKM)

Outline

1 Introduction




5 Link with CNN

9


model

10


Framework of the model

As shown in Figure 1, a new map xk is built from the previous one xk–1 byapplying successively three operators that perform patch extraction (Pk),kernel mapping (Mk) in a new RKHS Hk, and linear pooling (Ak),respectively. When going up in the hierarchy, the points xk(u) carryinformation from larger signal neighborhoods centered at u in Ω with moreinvariance, as we will formally show.

11

Construction of the Multilayer Convolutional Kernel Network(CKM) Patch extraction operator

Patch extraction operator

12

Construction of the Multilayer Convolutional Kernel Network(CKM) Patch extraction operator

Patch extraction operator

Given the layer xk–1, we consider a patch shape Sk, defined as acompact centered subset of Ω, e.g., a box,

we define the Hilbert space Pk := L2(Sk,Hk–1) equipped with thenorm ‖z‖2 =

∫Sk‖z(u)‖2dνk(u), where dνk is the normalized uniform

measure on Sk for every z in Pk.

we define the (linear) patch extraction operatorPk : L2(Ω,Hk–1)→ L2(Ω,Pk) such that for all u in Ω,

Pkxk–1(u) = (v 7→ xk–1(u+ v))v∈Sk∈ Pk.

Note that by equipping Pk with a normalized measure, by Fubini’stheorem, ‖Pkxk–1‖ = ‖xk–1‖ and hence Pkxk–1 is in L2(Ω,Pk).

13

Construction of the Multilayer Convolutional Kernel Network(CKM) Kernel mapping operator

Kernel mapping operator

14



Then, we map each patch of xk–1 to a RKHS Hk using the kernel mappingφk : Pk → Hk associated to a positive definite kernel Kk that operates onpatches.

We can define the non-linear pointwise operator Mk such that for all u in Ω,

MkPkxk–1(u) := φk(Pkxk–1(u)) ∈ Hk.

In this paper, it uses homogeneous dot-product kernels of the form:

Kk(z, z′) = ‖z‖‖z′‖κk(〈z, z′〉‖z‖‖z′‖

)= 〈φk(z), φk(z′)〉, (1)

where κk(u) =∑∞j=0 bjuj with bj ≥ 0 and κk(1) = 1, which ensures that

‖MkPkxk–1(u)‖ = ‖Pkxk–1(u)‖ and that ‖MkPkxk–1‖ is in L2(ω,Hk).

15



Convolutional Kernel Networks approximation

Approximate φk(z) by projection on span(φk(z1), ..., φk(zp))

Leads to tractable, p-dimensional representation ψk(z)

Anchor points z1, . . . , zp can be learned from data (K-means or backprop)

16

Construction of the Multilayer Convolutional Kernel Network(CKM) Pooling operator

Pooling operator

17


Pooling operator

The last step to build the layer xk consists of pooling neighboring values toachieve local shift-invariance.

We apply a linear convolution operator Ak with a Gaussian filter of scale σk,hσk

(u) := σ−dk h(u/σk), where h(u) = (2π)−d/2 exp(−|u|2/2).

Then, for all u in Ω,

xk(u) = AkMkPkxk–1(u) =

∫Rd

hσk(u− v)MkPkxk–1(v)dv ∈ Hk, (2)

Applying Schur’s test, we can obtains ‖Ak‖ ≤ 1. Thus, xk is in L2(Ω,Hk), with‖xk‖ = ‖AkMkPkxk–1‖ ≤ ‖MkPkxk–1‖.

18


Recap

19

Construction of the Multilayer Convolutional Kernel Network(CKM) Multilayer construction

Multilayer construction

Finally, we obtain a multilayer representation by composing multiple timesthe previous operators. In order to increase invariance with each layer, thesize of the patch Sk and pooling scale σk grow exponentially with k, withσk and the patch size supc∈Sk

|c| of the same order. With n layers, themaps xn may then be written

φn(x0) := xn = AnMnPnAn–1Mn–1Pn–1 · · · A1M1P1x0 ∈ L2(Ω,Hn).(3)

20

Stability to deformations

Outline

1 Introduction




5 Link with CNN

21


Stability to deformations: Definition

C1 diffeomorphism: τ : Ω→ Ω

action operator: Lτx(u) = x(u− τ(u))

Representation Φ(·) is stable if:

‖Φ(Lτx)− Φ(x)‖ ≤ (c1‖∇τ‖∞ + c2‖τ‖∞)‖x‖,

here c1, c2 are two constants, ∇τ is the Jacobian,‖∇τ‖∞ = supu∈Ω‖∇τ(u)‖, ‖τ‖∞ = supu∈Ω|τ(u)|.

translation invariance: c2 → 0

22


Stability results

Theorem

Let Φ(x) be a representation given by Φ(x) = Φn(A0x). If ‖∇τ‖∞ ≤ 12 ,

we have:

‖Φ(Lτx)− Φ(x)‖ ≤ (c1(1 + n)‖∇τ‖∞ +c2

σn‖τ‖∞)‖x‖ .

Here we assume that the input signal x0 = A0x, where A0 is the initialpooling operator which is used to control the high frequencies. σn is theparameter controls the pooling layer (reminder: it will grow exponentiallywith the number of layers n).

23

Link with CNN

Outline

1 Introduction




5 Link with CNN

24

Link with CNN

Link with CNN

CNN map construction:

CNN function fσ, input image x0 ∈ L2(Ω,Rp0) with p0 channels.

feature maps represented at layer k as a function zk ∈ L2(Ω,Rp0)

a set of filters (wik)i=1,...,pk , activation function δ

intermediate feature maps (before pooling operation) zk:

zik = nk(u)δ(〈wik, Pkzk−1(u)〉/nk(u)).

Here Pk is the patch extractor, nk(u) = ‖Pkxk−1(u)‖

Homogenerous activations: i.e., δ : z → ‖z‖δ(〈g, z〉/‖z‖) for all g in Pk

25

invariance and stability of deep convolutional representationslcarin/liqun1.11.2019.pdf ·...

Documents