rigid-motion scattering for image classiﬁcationsifre/research/phd_defense...conclusion •...

Rigid-Motion Scattering for Image Classification

Laurent Sifre

PhD Defense

October 6th, 2014

Image ClassificationTraining

Set

ClassifierRepresentation

TestingSet

Minimize training loss

Evaluate testing error

image class

fabric

tree

brick

fabric

tree

brick

Representation For Image Classification

Building a representation that is: - invariant to geometric

transformations, - informative, - stable to deformations is a fondamental problem of computer vision.

It is a hard problem to satisfy the three requirements.

Modulus of Fourier transform preserves information, is translation invariant, but highly unstable to deformations.

Existing image representation

What filters, non-linearities, connectivity? Many questions mostly answered by empirical performances.

Deep Convolutional Networks (ConvNets), Hinton, Lecun, Bengio Cascade of convolutions and regularizing non-linearities. Filters are learned.

Scattering networks (Mallat, Bruna): Mathematical construction of deep « scattering »networks with thorough analysis of their properties.

Handcrafted shallow representations. (SIFT, HOG, RIFT), Lowe and so many others. Various ad-hoc techniques. Difficult to combine to tackle harder problems.

SIFT

LeNet

Overview

• Wavelet transform and scattering network.

• Rigid-motion group.

• Separable scattering.

• Rigid-motion wavelet transform.

• Joint scattering.

• Application: texture classification.

• Application: separable ConvNets.

Problem: how to extend scattering to other groups ?

Wavelet Transform

Complementary information is recovered by rotated and dilated wavelets: scale

orientationThe wavelet transform decomposes into local average and wavelet coefficients .

Convolution with a window builds local translation invariance but also loses most of the information.

Theorem: (Littlewood-Paley ) iIf it tiles the Fourier plane tightly

then the wavelet transform almost preserves the norm

Definitions: The norm of the wavelet transform is

The associated Littlewood-Paley function is

Unitary Wavelets

In this case we say that is an frame.

Wavelet ModulusComplex modulus has a regularizing effect on analytic wavelet coefficients.

invariant part

non-linear covariant part

Wavelet modulus operator:

Translation ScatteringScattering is a cascade of wavelet-modulus:

is the « scattering order ».

Order 0 ScatteringScattering is a cascade of wavelet-modulus:

• Local average of the image. • Not very informative.


• Local average of the wavelet modulus coefficients.

• Similar to SIFT descriptors.


• Deep coefficients.

• Much richer information.

• Similar to ConvNets.

Order m ScatteringScattering is a cascade of wavelet-modulus:

Compact « path » notation

Scattering Properties

Theorem: (Mallat) Scattering almost preserves the norm. If , then

Theorem: (Mallat) Scattering is invariant to translations and stable to deformations. For any and any twice differentiable deformation such that ,

where

A deformation acts on image with

The Rigid-Motion Group• Rigid-motion

• Action on image position

• Compatibility equation

• Group law

• Action on images

Separable Rigid-Motion Invariance• Suppose that we have two operators that build

resp. invariance translation to and .

• Suppose that is also covariant to rotations, that is there exists an action of on such that

• Theorem 1: we can factorize into disjoint orbits and apply along each orbit. The resulting operator is invariant to rigid motions.

Rotation Covariance of Scattering

The Morlet wavelets are oriented, therefore a change of variable shows that Cascading this yields where For a fully delocalized scattering

Separable ScatteringTranslation Scattering

Orbit Extraction

Orientation Scattering

• Same properties than scattering • + rotation invariance

Separable invariant are invariant to larger group than intended

Each row translated

independently

1D Fourier + mod along rows

1D Fourier + mod along

columns

Identical representation

2D Fourier + mod

Non-identical representation

Wavelet Modulus Separates Horizontal and Vertical Grids

Equal EqualTranslated

Scattering retransforms different paths independently and then averages, which removes the translation.

Both texture have same scattering.

Signal processing on group• Recent works [Boscain12, Duits12] have

developed signal processing tools on rigid-motion group.

• Recent ConvNet [Krizhevsky12] uses three dimensional convolutions to capture higher level concept.

Rigid-motion Convolution• For any group:

• For the rigid-motion group:

• Naive implementation:= # positions = # orientations

• Factorization of convolutions for separable filters:

Fast rigid-motion convolutions• Separable rigid-motion filters:

2D spatial 1D orientation

Naive convolution:

2D conv 1D conv

Separable convolution:

2D Separable Wavelets1D wavelets

2D separable wavelets

4 types of wavelets, one for every possible combination

Separable Joint Wavelet Transform

The associated wavelet transform is an operator

A separable joint wavelet family is defined as

defined as

where = spatial orientation, spatial scale, orientation scale

Rigid-motion Wavelet FrameTheorem 2: If there exist such that

and

then the family is an frame i.e.

where

2D Fast Wavelet Transform (FWT)Suppose that there exists filters such that

Then

can be computed as

Rigid-Motion FWTFor each slice, 2D FWT For each leaf, 1D FWT

Covariance of the Wavelet TransformProperty: the 2D wavelet transform is covariant to the action of the rigid-motion group. For any rigid motion . and any image ,

where is defined for as

and for as

Joint Scattering

First, 2d wavelet modulus:

Then, cascade of rigid-motion wavelet modulus:

Order 0 Joint Scattering

• Same as order 0 translation scattering. • Local average of input image. • Local translation invariance. • Full rotation invariance. • Not very informative.


• Spatial and orientation local average of the wavelet modulus.

• Indexed by position and scale.

• Family of fully rotation invariant 2D signals if . as it is the case here.

• Family of partially invariant 3D signal if .


• Same invariance properties as order 1. • Interactions between different positions and orientations.

Joint Scattering InvarianceTheorem 3: there exists a constant such that for any the rigid-motion joint scattering at spatial scale and at rotational scale verifies

• The term is the largest

displacement induced by on the support of .

• If and then

OUTex 10 with Separable Scattering

Training: Single orientation.

Testing: (rot) 8 rotations.

Testing: (rot-shear) 8 orientations. Shear 1.3 horizontal.

• Good test case for invariant descriptors: the invariance cannot be learned from the data.

• Nearest neighbor classifier.Stability

Higher orders improve results

Scale Invariance• Scale is different from rotations:

• These differences make it difficult to use wavelets along scales.

• Scattering is stable to deformations and dilations, thus slightly dilated or deformed version of the same signal lie on a small dimensional subspace.

• We thus use the PCA classifier from [Bruna12] for experiments involving significant deformation and scales.

• Limited range of available scales.

• Not a periodic group.

• Wavelet coefficients at different scales are sampled at different resolutions.

PCA ClassifierAt training time, the PCA classifier models each class as the affine subspace generated by the first eigenvectors in the SVD of all scattering vector of the class.

At testing time, a test image is classified according to the minimum projection error of its scattering vector:

Logarithm and Scale Augmentation• Scattering vector of texture image have typically power law behavior

w.r.t. scale. An logarithm helps further linearizing this behavior. • To improve scale invariance, we augment the training set with the

scattering of dilated versions of each image with scales . • At testing time, we average the scattering of dilated versions of the

original image. • The scattering is covariant to scale. Dilated scattering vectors can be

deducted from the scattering vector of the original image.

KTH-Tips

• 10 classes. • 9 scales, 3 viewpoints, 3 illuminations = 81 images/class. • Low resolution 200x200. • No in-plane rotation. • Data is split between between training and testing. • Results are averaged over 200 random splits.

Rotation invariance does not degrade accuracy.

Scaleinvariance increases accuracy.

• 25 classes of 40 images. • Higher resolution 640x480. • Large, uncalibrated affine transformations. • Large deformations.

Rotation invariance increases accuracy.


UIUCTex

• 25 classes of 40 images. • Similar to UIUC. • Higher resolution 1280x960.

Rotation invariance increases accuracy.


UMD

Hyperparameters• max scale for KTH-Tips, UICTex, UMD.

• scales per octaves.

• orientations between

• full rotational invariance.

• 4 dilations for scale invariance

• Mirror padding to avoid losing to much at boundaries.

State-of-the-art results on three datasetswith almost the same hyper parameters.

Separable ConvNet• Main difference between translation and joint

scattering is 3D convolution which recombines the information from different paths.

• ConvNet 2000’s: 2D convolutions

• ConvNet 2010’s: 3D convolutions

AlexNet

LeNet

First 2 layers of AlexNetFirst layer

Second layer

Highly redondant along input

depths

waste of capacity

Separable Convolutions in Convnet.

2D conv

1D conv

For a given capacity: • Less to learn • Less to compute

ImageNet ILSVRC2012 1 K classes, 1.2 M images.

20% less steps-to-accuracy withGoogle’s AlexNet implementation.

3D conv

Vanilla Separable

Conclusion• Problem: the scattering transform (Mallat et al.) is a translation invariant,

informative and stable signal representation. How to extends its properties to other, more complicated groups that affects natural images?

• Focus on the affine group for theory and on the rigid-motion group (translations and rotations) in applications.

• The separable scattering is the most straightforward way. It cascades scattering along position and along orientation parameter. It loses information about internal variables joint distribution.

• The joint scattering recombines the internal variables of intermediate layers by cascading wavelet modulus operator on geometric group. It is a tighter invariant.

• Proofs that these operators are unitary and invariant.

• Fast algorithms.

• Texture classification: state-of-the-art on most datasets.

• Generic object classification: more efficients convolutions in ConvNets.

rigid-motion scattering for image classiﬁcationsifre/research/phd_defense...conclusion •...

Documents