rigid-motion scattering for image classificationsifre/research/phd_defense...conclusion •...

46
Rigid-Motion Scattering for Image Classification Laurent Sifre PhD Defense October 6th, 2014

Upload: others

Post on 31-Aug-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Rigid-Motion Scattering for Image Classificationsifre/research/phd_defense...Conclusion • Problem: the scattering transform (Mallat et al.) is a translation invariant, informative

Rigid-Motion Scattering for Image Classification

Laurent Sifre

PhD Defense

October 6th, 2014

Page 2: Rigid-Motion Scattering for Image Classificationsifre/research/phd_defense...Conclusion • Problem: the scattering transform (Mallat et al.) is a translation invariant, informative

Image ClassificationTraining

Set

ClassifierRepresentation

TestingSet

Minimize training loss

Evaluate testing error

image class

fabric

tree

brick

fabric

tree

brick

Page 3: Rigid-Motion Scattering for Image Classificationsifre/research/phd_defense...Conclusion • Problem: the scattering transform (Mallat et al.) is a translation invariant, informative

Representation For Image Classification

Building a representation that is: - invariant to geometric

transformations, - informative, - stable to deformations is a fondamental problem of computer vision.

It is a hard problem to satisfy the three requirements.

Modulus of Fourier transform preserves information, is translation invariant, but highly unstable to deformations.

Page 4: Rigid-Motion Scattering for Image Classificationsifre/research/phd_defense...Conclusion • Problem: the scattering transform (Mallat et al.) is a translation invariant, informative

Existing image representation

What filters, non-linearities, connectivity? Many questions mostly answered by empirical performances.

Deep Convolutional Networks (ConvNets), Hinton, Lecun, Bengio Cascade of convolutions and regularizing non-linearities. Filters are learned.

Scattering networks (Mallat, Bruna): Mathematical construction of deep « scattering »networks with thorough analysis of their properties.

Handcrafted shallow representations. (SIFT, HOG, RIFT), Lowe and so many others. Various ad-hoc techniques. Difficult to combine to tackle harder problems.

SIFT

LeNet

Page 5: Rigid-Motion Scattering for Image Classificationsifre/research/phd_defense...Conclusion • Problem: the scattering transform (Mallat et al.) is a translation invariant, informative

Overview

• Wavelet transform and scattering network.

• Rigid-motion group.

• Separable scattering.

• Rigid-motion wavelet transform.

• Joint scattering.

• Application: texture classification.

• Application: separable ConvNets.

Problem: how to extend scattering to other groups ?

Page 6: Rigid-Motion Scattering for Image Classificationsifre/research/phd_defense...Conclusion • Problem: the scattering transform (Mallat et al.) is a translation invariant, informative

Wavelet Transform

Complementary information is recovered by rotated and dilated wavelets: scale

orientationThe wavelet transform decomposes into local average and wavelet coefficients .

Convolution with a window builds local translation invariance but also loses most of the information.

Page 7: Rigid-Motion Scattering for Image Classificationsifre/research/phd_defense...Conclusion • Problem: the scattering transform (Mallat et al.) is a translation invariant, informative

Theorem: (Littlewood-Paley ) iIf it tiles the Fourier plane tightly

then the wavelet transform almost preserves the norm

Definitions: The norm of the wavelet transform is

The associated Littlewood-Paley function is

Unitary Wavelets

In this case we say that is an frame.

Page 8: Rigid-Motion Scattering for Image Classificationsifre/research/phd_defense...Conclusion • Problem: the scattering transform (Mallat et al.) is a translation invariant, informative

Wavelet ModulusComplex modulus has a regularizing effect on analytic wavelet coefficients.

invariant part

non-linear covariant part

Wavelet modulus operator:

Page 9: Rigid-Motion Scattering for Image Classificationsifre/research/phd_defense...Conclusion • Problem: the scattering transform (Mallat et al.) is a translation invariant, informative

Translation ScatteringScattering is a cascade of wavelet-modulus:

is the « scattering order ».

Page 10: Rigid-Motion Scattering for Image Classificationsifre/research/phd_defense...Conclusion • Problem: the scattering transform (Mallat et al.) is a translation invariant, informative

Order 0 ScatteringScattering is a cascade of wavelet-modulus:

• Local average of the image. • Not very informative.

Page 11: Rigid-Motion Scattering for Image Classificationsifre/research/phd_defense...Conclusion • Problem: the scattering transform (Mallat et al.) is a translation invariant, informative

Order 1 ScatteringScattering is a cascade of wavelet-modulus:

• Local average of the wavelet modulus coefficients.

• Similar to SIFT descriptors.

Page 12: Rigid-Motion Scattering for Image Classificationsifre/research/phd_defense...Conclusion • Problem: the scattering transform (Mallat et al.) is a translation invariant, informative

Order 2 ScatteringScattering is a cascade of wavelet-modulus:

• Deep coefficients.

• Much richer information.

• Similar to ConvNets.

Page 13: Rigid-Motion Scattering for Image Classificationsifre/research/phd_defense...Conclusion • Problem: the scattering transform (Mallat et al.) is a translation invariant, informative

Order m ScatteringScattering is a cascade of wavelet-modulus:

Compact « path » notation

Page 14: Rigid-Motion Scattering for Image Classificationsifre/research/phd_defense...Conclusion • Problem: the scattering transform (Mallat et al.) is a translation invariant, informative

Scattering Properties

Theorem: (Mallat) Scattering almost preserves the norm. If , then

Theorem: (Mallat) Scattering is invariant to translations and stable to deformations. For any and any twice differentiable deformation such that ,

where

A deformation acts on image with

Page 15: Rigid-Motion Scattering for Image Classificationsifre/research/phd_defense...Conclusion • Problem: the scattering transform (Mallat et al.) is a translation invariant, informative

The Rigid-Motion Group• Rigid-motion

• Action on image position

• Compatibility equation

• Group law

• Action on images

Page 16: Rigid-Motion Scattering for Image Classificationsifre/research/phd_defense...Conclusion • Problem: the scattering transform (Mallat et al.) is a translation invariant, informative

Separable Rigid-Motion Invariance• Suppose that we have two operators that build

resp. invariance translation to and .

• Suppose that is also covariant to rotations, that is there exists an action of on such that

• Theorem 1: we can factorize into disjoint orbits and apply along each orbit. The resulting operator is invariant to rigid motions.

Page 17: Rigid-Motion Scattering for Image Classificationsifre/research/phd_defense...Conclusion • Problem: the scattering transform (Mallat et al.) is a translation invariant, informative

Rotation Covariance of Scattering

The Morlet wavelets are oriented, therefore a change of variable shows that Cascading this yields where For a fully delocalized scattering

Page 18: Rigid-Motion Scattering for Image Classificationsifre/research/phd_defense...Conclusion • Problem: the scattering transform (Mallat et al.) is a translation invariant, informative

Separable ScatteringTranslation Scattering

Orbit Extraction

Orientation Scattering

• Same properties than scattering • + rotation invariance

Page 19: Rigid-Motion Scattering for Image Classificationsifre/research/phd_defense...Conclusion • Problem: the scattering transform (Mallat et al.) is a translation invariant, informative

Separable invariant are invariant to larger group than intended

Each row translated

independently

1D Fourier + mod along rows

1D Fourier + mod along

columns

Identical representation

2D Fourier + mod

Non-identical representation

Page 20: Rigid-Motion Scattering for Image Classificationsifre/research/phd_defense...Conclusion • Problem: the scattering transform (Mallat et al.) is a translation invariant, informative

Wavelet Modulus Separates Horizontal and Vertical Grids

Equal EqualTranslated

Scattering retransforms different paths independently and then averages, which removes the translation.

Both texture have same scattering.

Page 21: Rigid-Motion Scattering for Image Classificationsifre/research/phd_defense...Conclusion • Problem: the scattering transform (Mallat et al.) is a translation invariant, informative

Signal processing on group• Recent works [Boscain12, Duits12] have

developed signal processing tools on rigid-motion group.

• Recent ConvNet [Krizhevsky12] uses three dimensional convolutions to capture higher level concept.

Page 22: Rigid-Motion Scattering for Image Classificationsifre/research/phd_defense...Conclusion • Problem: the scattering transform (Mallat et al.) is a translation invariant, informative

Rigid-motion Convolution• For any group:

• For the rigid-motion group:

• Naive implementation:= # positions = # orientations

Page 23: Rigid-Motion Scattering for Image Classificationsifre/research/phd_defense...Conclusion • Problem: the scattering transform (Mallat et al.) is a translation invariant, informative

• Factorization of convolutions for separable filters:

Fast rigid-motion convolutions• Separable rigid-motion filters:

2D spatial 1D orientation

Naive convolution:

2D conv 1D conv

Separable convolution:

Page 24: Rigid-Motion Scattering for Image Classificationsifre/research/phd_defense...Conclusion • Problem: the scattering transform (Mallat et al.) is a translation invariant, informative

2D Separable Wavelets1D wavelets

2D separable wavelets

4 types of wavelets, one for every possible combination

Page 25: Rigid-Motion Scattering for Image Classificationsifre/research/phd_defense...Conclusion • Problem: the scattering transform (Mallat et al.) is a translation invariant, informative

Separable Joint Wavelet Transform

The associated wavelet transform is an operator

A separable joint wavelet family is defined as

defined as

where = spatial orientation, spatial scale, orientation scale

Page 26: Rigid-Motion Scattering for Image Classificationsifre/research/phd_defense...Conclusion • Problem: the scattering transform (Mallat et al.) is a translation invariant, informative

Rigid-motion Wavelet FrameTheorem 2: If there exist such that

and

then the family is an frame i.e.

where

Page 27: Rigid-Motion Scattering for Image Classificationsifre/research/phd_defense...Conclusion • Problem: the scattering transform (Mallat et al.) is a translation invariant, informative

2D Fast Wavelet Transform (FWT)Suppose that there exists filters such that

Then

can be computed as

Page 28: Rigid-Motion Scattering for Image Classificationsifre/research/phd_defense...Conclusion • Problem: the scattering transform (Mallat et al.) is a translation invariant, informative

Rigid-Motion FWTFor each slice, 2D FWT For each leaf, 1D FWT

Page 29: Rigid-Motion Scattering for Image Classificationsifre/research/phd_defense...Conclusion • Problem: the scattering transform (Mallat et al.) is a translation invariant, informative

Covariance of the Wavelet TransformProperty: the 2D wavelet transform is covariant to the action of the rigid-motion group. For any rigid motion . and any image ,

where is defined for as

and for as

Page 30: Rigid-Motion Scattering for Image Classificationsifre/research/phd_defense...Conclusion • Problem: the scattering transform (Mallat et al.) is a translation invariant, informative

Joint Scattering

First, 2d wavelet modulus:

Then, cascade of rigid-motion wavelet modulus:

Page 31: Rigid-Motion Scattering for Image Classificationsifre/research/phd_defense...Conclusion • Problem: the scattering transform (Mallat et al.) is a translation invariant, informative

Order 0 Joint Scattering

• Same as order 0 translation scattering. • Local average of input image. • Local translation invariance. • Full rotation invariance. • Not very informative.

Page 32: Rigid-Motion Scattering for Image Classificationsifre/research/phd_defense...Conclusion • Problem: the scattering transform (Mallat et al.) is a translation invariant, informative

Order 1 Joint Scattering

• Spatial and orientation local average of the wavelet modulus.

• Indexed by position and scale.

• Family of fully rotation invariant 2D signals if . as it is the case here.

• Family of partially invariant 3D signal if .

Page 33: Rigid-Motion Scattering for Image Classificationsifre/research/phd_defense...Conclusion • Problem: the scattering transform (Mallat et al.) is a translation invariant, informative

Order 2 Joint Scattering

• Same invariance properties as order 1. • Interactions between different positions and orientations.

Page 34: Rigid-Motion Scattering for Image Classificationsifre/research/phd_defense...Conclusion • Problem: the scattering transform (Mallat et al.) is a translation invariant, informative

Joint Scattering InvarianceTheorem 3: there exists a constant such that for any the rigid-motion joint scattering at spatial scale and at rotational scale verifies

• The term is the largest

displacement induced by on the support of .

• If and then

Page 35: Rigid-Motion Scattering for Image Classificationsifre/research/phd_defense...Conclusion • Problem: the scattering transform (Mallat et al.) is a translation invariant, informative

OUTex 10 with Separable Scattering

Training: Single orientation.

Testing: (rot) 8 rotations.

Testing: (rot-shear) 8 orientations. Shear 1.3 horizontal.

• Good test case for invariant descriptors: the invariance cannot be learned from the data.

• Nearest neighbor classifier.Stability

Higher orders improve results

Page 36: Rigid-Motion Scattering for Image Classificationsifre/research/phd_defense...Conclusion • Problem: the scattering transform (Mallat et al.) is a translation invariant, informative

Scale Invariance• Scale is different from rotations:

• These differences make it difficult to use wavelets along scales.

• Scattering is stable to deformations and dilations, thus slightly dilated or deformed version of the same signal lie on a small dimensional subspace.

• We thus use the PCA classifier from [Bruna12] for experiments involving significant deformation and scales.

• Limited range of available scales.

• Not a periodic group.

• Wavelet coefficients at different scales are sampled at different resolutions.

Page 37: Rigid-Motion Scattering for Image Classificationsifre/research/phd_defense...Conclusion • Problem: the scattering transform (Mallat et al.) is a translation invariant, informative

PCA ClassifierAt training time, the PCA classifier models each class as the affine subspace generated by the first eigenvectors in the SVD of all scattering vector of the class.

At testing time, a test image is classified according to the minimum projection error of its scattering vector:

Page 38: Rigid-Motion Scattering for Image Classificationsifre/research/phd_defense...Conclusion • Problem: the scattering transform (Mallat et al.) is a translation invariant, informative

Logarithm and Scale Augmentation• Scattering vector of texture image have typically power law behavior

w.r.t. scale. An logarithm helps further linearizing this behavior. • To improve scale invariance, we augment the training set with the

scattering of dilated versions of each image with scales . • At testing time, we average the scattering of dilated versions of the

original image. • The scattering is covariant to scale. Dilated scattering vectors can be

deducted from the scattering vector of the original image.

Page 39: Rigid-Motion Scattering for Image Classificationsifre/research/phd_defense...Conclusion • Problem: the scattering transform (Mallat et al.) is a translation invariant, informative

KTH-Tips

• 10 classes. • 9 scales, 3 viewpoints, 3 illuminations = 81 images/class. • Low resolution 200x200. • No in-plane rotation. • Data is split between between training and testing. • Results are averaged over 200 random splits.

Rotation invariance does not degrade accuracy.

Scaleinvariance increases accuracy.

Page 40: Rigid-Motion Scattering for Image Classificationsifre/research/phd_defense...Conclusion • Problem: the scattering transform (Mallat et al.) is a translation invariant, informative

• 25 classes of 40 images. • Higher resolution 640x480. • Large, uncalibrated affine transformations. • Large deformations.

Rotation invariance increases accuracy.

Scaleinvariance increases accuracy.

UIUCTex

Page 41: Rigid-Motion Scattering for Image Classificationsifre/research/phd_defense...Conclusion • Problem: the scattering transform (Mallat et al.) is a translation invariant, informative

• 25 classes of 40 images. • Similar to UIUC. • Higher resolution 1280x960.

Rotation invariance increases accuracy.

Scaleinvariance increases accuracy.

UMD

Page 42: Rigid-Motion Scattering for Image Classificationsifre/research/phd_defense...Conclusion • Problem: the scattering transform (Mallat et al.) is a translation invariant, informative

Hyperparameters• max scale for KTH-Tips, UICTex, UMD.

• scales per octaves.

• orientations between

• full rotational invariance.

• 4 dilations for scale invariance

• Mirror padding to avoid losing to much at boundaries.

State-of-the-art results on three datasetswith almost the same hyper parameters.

Page 43: Rigid-Motion Scattering for Image Classificationsifre/research/phd_defense...Conclusion • Problem: the scattering transform (Mallat et al.) is a translation invariant, informative

Separable ConvNet• Main difference between translation and joint

scattering is 3D convolution which recombines the information from different paths.

• ConvNet 2000’s: 2D convolutions

• ConvNet 2010’s: 3D convolutions

AlexNet

LeNet

Page 44: Rigid-Motion Scattering for Image Classificationsifre/research/phd_defense...Conclusion • Problem: the scattering transform (Mallat et al.) is a translation invariant, informative

First 2 layers of AlexNetFirst layer

Second layer

Highly redondant along input

depths

waste of capacity

Page 45: Rigid-Motion Scattering for Image Classificationsifre/research/phd_defense...Conclusion • Problem: the scattering transform (Mallat et al.) is a translation invariant, informative

Separable Convolutions in Convnet.

2D conv

1D conv

For a given capacity: • Less to learn • Less to compute

ImageNet ILSVRC2012 1 K classes, 1.2 M images.

20% less steps-to-accuracy withGoogle’s AlexNet implementation.

3D conv

Vanilla Separable

Page 46: Rigid-Motion Scattering for Image Classificationsifre/research/phd_defense...Conclusion • Problem: the scattering transform (Mallat et al.) is a translation invariant, informative

Conclusion• Problem: the scattering transform (Mallat et al.) is a translation invariant,

informative and stable signal representation. How to extends its properties to other, more complicated groups that affects natural images?

• Focus on the affine group for theory and on the rigid-motion group (translations and rotations) in applications.

• The separable scattering is the most straightforward way. It cascades scattering along position and along orientation parameter. It loses information about internal variables joint distribution.

• The joint scattering recombines the internal variables of intermediate layers by cascading wavelet modulus operator on geometric group. It is a tighter invariant.

• Proofs that these operators are unitary and invariant.

• Fast algorithms.

• Texture classification: state-of-the-art on most datasets.

• Generic object classification: more efficients convolutions in ConvNets.