deep learning and vision - indico · • no biophysics y = f (x i w ix i + b) ... 2015 inception v3...

72
Deep Learning and Vision Jon Shlens Google Research 28 April 2017

Upload: others

Post on 02-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Deep Learning and Vision - Indico · • no biophysics y = f (X i w ix i + b) ... 2015 Inception v3 2nd 3.6% 2015 ResNet 1st 3.6% ... X Zhang, S Ren, J Sun (2015) Rethinking the Inception

Deep Learning and VisionJon Shlens

Google Research 28 April 2017

Page 2: Deep Learning and Vision - Indico · • no biophysics y = f (X i w ix i + b) ... 2015 Inception v3 2nd 3.6% 2015 ResNet 1st 3.6% ... X Zhang, S Ren, J Sun (2015) Rethinking the Inception

1. A brief history and motivation

2. Deep learning for vision

• What is deep learning?

• Convolutions and neural networks

3. Advances in neural networks

• Nonlinearities: example of batch normalization

• Understanding: example of gradient propagation

4. Conclusions

Agenda

Page 3: Deep Learning and Vision - Indico · • no biophysics y = f (X i w ix i + b) ... 2015 Inception v3 2nd 3.6% 2015 ResNet 1st 3.6% ... X Zhang, S Ren, J Sun (2015) Rethinking the Inception

1. A brief history and motivation

2. Deep learning for vision

• What is deep learning?

• Convolutions and neural networks

3. Advances in neural networks

• Nonlinearities: example of batch normalization

• Understanding: example of gradient propagation

4. Conclusions

Agenda

Page 4: Deep Learning and Vision - Indico · • no biophysics y = f (X i w ix i + b) ... 2015 Inception v3 2nd 3.6% 2015 ResNet 1st 3.6% ... X Zhang, S Ren, J Sun (2015) Rethinking the Inception

http://dspace.mit.edu/handle/1721.1/6125

The hubris of artificial intelligence

Page 5: Deep Learning and Vision - Indico · • no biophysics y = f (X i w ix i + b) ... 2015 Inception v3 2nd 3.6% 2015 ResNet 1st 3.6% ... X Zhang, S Ren, J Sun (2015) Rethinking the Inception

• For decades we tried to write down every possible rule for everyday tasks —> impossible

• Every day tasks we consider blindingly obvious have been exceedingly difficult for computers.

‘Simple’ problems proved most difficult.

cat?

Page 6: Deep Learning and Vision - Indico · • no biophysics y = f (X i w ix i + b) ... 2015 Inception v3 2nd 3.6% 2015 ResNet 1st 3.6% ... X Zhang, S Ren, J Sun (2015) Rethinking the Inception

Machine learning applied everywhere.

• The last decade has shown that if we teach computers

to perform a task, they can perform exceedingly better.

machine translation speech recognitionface recognition time series analysis

molecular activity prediction image recognitionroad hazard detection object detection

optical character recognition motor planning

motor activity planning syntax parsing

language understanding …

Page 7: Deep Learning and Vision - Indico · • no biophysics y = f (X i w ix i + b) ... 2015 Inception v3 2nd 3.6% 2015 ResNet 1st 3.6% ... X Zhang, S Ren, J Sun (2015) Rethinking the Inception

Large scale academic competition focused on predicting 1000 object classes (~1.2M images).

• electric ray

• barracuda

• coho salmon

• tench

• goldfish

• sawfish

• smalltooth sawfish

• guitarfish

• stingray

• roughtail stingray

• ...

The computer vision competition:

Imagenet: A large-scale hierarchical image database J Deng et al (2009)

classes

Page 8: Deep Learning and Vision - Indico · • no biophysics y = f (X i w ix i + b) ... 2015 Inception v3 2nd 3.6% 2015 ResNet 1st 3.6% ... X Zhang, S Ren, J Sun (2015) Rethinking the Inception

History of techniques in ImageNet Challenge

Locality constrained linear coding + SVM NEC & UIUCFisher kernel + SVM Xerox Research Center EuropeSIFT features + LI2C Nanyang Technological InstituteSIFT features + k-Nearest Neighbors Laboratoire d'Informatique de GrenobleColor features + canonical correlation analysis National Institute of Informatics, Tokyo

Compressed Fisher kernel + SVM Xerox Research Center EuropeSIFT bag-of-words + VQ + SVM University of Amsterdam & University of

TrentoSIFT + ? ISI Lab, Tokyo University

Deep convolutional neural network University of TorontoDiscriminatively trained DPMs University of OxfordFisher-based SIFT features + SVM ISI Lab, Tokyo University

ImageNet 2010

ImageNet 2011

ImageNet 2012

Page 9: Deep Learning and Vision - Indico · • no biophysics y = f (X i w ix i + b) ... 2015 Inception v3 2nd 3.6% 2015 ResNet 1st 3.6% ... X Zhang, S Ren, J Sun (2015) Rethinking the Inception

Good fine-grain classification.

hibiscus dahila

Both recognized as “meal”

Good generalization.

mealSensible errors.

snake dog

** Trained a model for whole image recognition using Inception-v3 architecture.

Examples of artificial vision in action

• fine-grain classification

• generalization

• sensible errors

meal

Page 10: Deep Learning and Vision - Indico · • no biophysics y = f (X i w ix i + b) ... 2015 Inception v3 2nd 3.6% 2015 ResNet 1st 3.6% ... X Zhang, S Ren, J Sun (2015) Rethinking the Inception

1. A brief history and motivation

2. Deep learning for vision

• What is deep learning?

• Convolutions and neural networks

3. Advances in neural networks

• Nonlinearities: example of batch normalization

• Understanding: example of gradient propagation

4. Conclusions

Agenda

Page 11: Deep Learning and Vision - Indico · • no biophysics y = f (X i w ix i + b) ... 2015 Inception v3 2nd 3.6% 2015 ResNet 1st 3.6% ... X Zhang, S Ren, J Sun (2015) Rethinking the Inception

History of techniques in ImageNet Challenge

Locality constrained linear coding + SVM NEC & UIUCFisher kernel + SVM Xerox Research Center EuropeSIFT features + LI2C Nanyang Technological InstituteSIFT features + k-Nearest Neighbors Laboratoire d'Informatique de GrenobleColor features + canonical correlation analysis National Institute of Informatics, Tokyo

Compressed Fisher kernel + SVM Xerox Research Center EuropeSIFT bag-of-words + VQ + SVM University of Amsterdam & University of

TrentoSIFT + ? ISI Lab, Tokyo University

Deep convolutional neural network University of TorontoDiscriminatively trained DPMs University of OxfordFisher-based SIFT features + SVM ISI Lab, Tokyo University

ImageNet 2010

ImageNet 2011

ImageNet 2012

Page 12: Deep Learning and Vision - Indico · • no biophysics y = f (X i w ix i + b) ... 2015 Inception v3 2nd 3.6% 2015 ResNet 1st 3.6% ... X Zhang, S Ren, J Sun (2015) Rethinking the Inception

• Multi-layer perceptron trained with back-propagation are ideas known since the 1980’s.

Deep convolutional neural networks

ImageNet Classification with Deep Convolutional Neural Networks A Krizhevsky I Sutskever, G Hinton (2012)

Backpropagation applied to handwritten zip code recognitionY LeCun et al (1990)

Page 13: Deep Learning and Vision - Indico · • no biophysics y = f (X i w ix i + b) ... 2015 Inception v3 2nd 3.6% 2015 ResNet 1st 3.6% ... X Zhang, S Ren, J Sun (2015) Rethinking the Inception

• Winning network contained 60M parameters.

• Achieving scale in compute and data is critical.

• large academic data sets

• SIMD hardware (e.g. GPU’s, SSE instruction sets)

Convolutional neural networks, revisited.

ImageNet Classification with Deep Convolutional Neural Networks A Krizhevsky I Sutskever, G Hinton (2012)

Page 14: Deep Learning and Vision - Indico · • no biophysics y = f (X i w ix i + b) ... 2015 Inception v3 2nd 3.6% 2015 ResNet 1st 3.6% ... X Zhang, S Ren, J Sun (2015) Rethinking the Inception

“Deep learning” = artificial neural networks

“cat”

Loosely based on (what little) we know about the brain

What is deep learning?

• Hierarchical composition of simple mathematical functions

Untangling invariant object recognitionJ DiCarlo and D Cox (2007)

Page 15: Deep Learning and Vision - Indico · • no biophysics y = f (X i w ix i + b) ... 2015 Inception v3 2nd 3.6% 2015 ResNet 1st 3.6% ... X Zhang, S Ren, J Sun (2015) Rethinking the Inception

“Deep learning” = artificial neural networks

• Hierarchical composition of simple mathematical functions

“cat”

Loosely based on (what little) we know about the brain

What is deep learning?

Loosely inspired by (what little) we know about the brain

Untangling invariant object recognitionJ DiCarlo and D Cox (2007)

Page 16: Deep Learning and Vision - Indico · • no biophysics y = f (X i w ix i + b) ... 2015 Inception v3 2nd 3.6% 2015 ResNet 1st 3.6% ... X Zhang, S Ren, J Sun (2015) Rethinking the Inception

A toy model of a neuron: “perceptron”

The perceptron: a probabilistic model for information storage and organization in the brain.F Rosenblatt (1958)

• no spikes

• no recurrence or feedback *

• no dynamics or state *

• no biophysics

y = f(X

i

wixi + b)

Simplify the neuron to a sum over weighted inputs and a nonlinear activation function.

f(z) = max(0, z)

Page 17: Deep Learning and Vision - Indico · • no biophysics y = f (X i w ix i + b) ... 2015 Inception v3 2nd 3.6% 2015 ResNet 1st 3.6% ... X Zhang, S Ren, J Sun (2015) Rethinking the Inception
Page 18: Deep Learning and Vision - Indico · • no biophysics y = f (X i w ix i + b) ... 2015 Inception v3 2nd 3.6% 2015 ResNet 1st 3.6% ... X Zhang, S Ren, J Sun (2015) Rethinking the Inception

Employing a network for a task.

• A network is a hierarchical composition of nonlinear functions.

y = f(f(...)) y

cat dog car truck cow bicycle

• Output of network is a real-valued vector.

y

label of node j

“dog”

Page 19: Deep Learning and Vision - Indico · • no biophysics y = f (X i w ix i + b) ... 2015 Inception v3 2nd 3.6% 2015 ResNet 1st 3.6% ... X Zhang, S Ren, J Sun (2015) Rethinking the Inception

Example: how to classify with a network

Step 1: Convert the network output to a probability distribution with the softmax function.

cat dog car truck cow bicycle

y

label of node j label of node j

P (j) =exp(yj)Pj exp(yj)

00.250.5

0.751

cat dog car truck cow bicycle

Page 20: Deep Learning and Vision - Indico · • no biophysics y = f (X i w ix i + b) ... 2015 Inception v3 2nd 3.6% 2015 ResNet 1st 3.6% ... X Zhang, S Ren, J Sun (2015) Rethinking the Inception

Example: how to classify with a network

Step 2: Minimize the cross-entropy loss between the predicted distribution and a one-hot target distribution.

label of node j

00.25

0.50.75

1

cat dog car truck cow bicycle 00.250.5

0.75

1

cat dog car truck cow bicycle

• Cross entropy loss is the KL-divergence the predicted and target distribution.

loss =

X

x

p(x) log

p(x)

q(x)

p(x)q(x)

predicted distribution target distribution

Page 21: Deep Learning and Vision - Indico · • no biophysics y = f (X i w ix i + b) ... 2015 Inception v3 2nd 3.6% 2015 ResNet 1st 3.6% ... X Zhang, S Ren, J Sun (2015) Rethinking the Inception

Gradient descent with back-propagation.

• Calculate the partial derivatives of each parameter with respect to the loss to minimize an objective function via gradient descent.

y@ loss

@ wi

Beyond Regression: New Tools for Prediction and Analysis in the Behavioral SciencesP Werbos (1974)

Learning Internal Representations by Error Propagation.D Rumelhart, G Hinton, R Williams, James L. McClelland et al (1986)

• For weights buried inside the network, employ clever factorization of the chain rule, i.e. back-propagation.

Page 22: Deep Learning and Vision - Indico · • no biophysics y = f (X i w ix i + b) ... 2015 Inception v3 2nd 3.6% 2015 ResNet 1st 3.6% ... X Zhang, S Ren, J Sun (2015) Rethinking the Inception

Optimization is highly non-convex.

Note that deep networks operation in O(1M) dimensions.

weight 2weight 1

loss

Page 23: Deep Learning and Vision - Indico · • no biophysics y = f (X i w ix i + b) ... 2015 Inception v3 2nd 3.6% 2015 ResNet 1st 3.6% ... X Zhang, S Ren, J Sun (2015) Rethinking the Inception

playground.tensorflow.org

Page 24: Deep Learning and Vision - Indico · • no biophysics y = f (X i w ix i + b) ... 2015 Inception v3 2nd 3.6% 2015 ResNet 1st 3.6% ... X Zhang, S Ren, J Sun (2015) Rethinking the Inception

E. Coli of image recognition

Gradient-based learning applied to document recognitionY. LeCun, L. Bottou, Y. Bengio, and P. Haffner (1998)

http://yann.lecun.com/exdb/mnist/

machine learning system (e.g. neural network)

“4”

Page 25: Deep Learning and Vision - Indico · • no biophysics y = f (X i w ix i + b) ... 2015 Inception v3 2nd 3.6% 2015 ResNet 1st 3.6% ... X Zhang, S Ren, J Sun (2015) Rethinking the Inception

Multi-layer perceptron on MNIST.

• Note that weights grow as the square of the number of pixels.

“4”

handwritten zip codes

P=28

logistic classifier (M=10)

fully connected (N=100)

# weights = N x M = 1000

# weights = N x P2 = 78400

• Consider that the iPhone camera uses P = 2000, then the number of weights would be 4 million.

M = # classes

N = # hidden units

Page 26: Deep Learning and Vision - Indico · • no biophysics y = f (X i w ix i + b) ... 2015 Inception v3 2nd 3.6% 2015 ResNet 1st 3.6% ... X Zhang, S Ren, J Sun (2015) Rethinking the Inception

Statistics of natural images: Scaling in the woodsD Ruderman and W Bialek (1994)

Natural image statistics and neural representationE Simoncelli and B Olshausen (2001)

… translationcropping dilation contrastrotation scalebrightness …

Natural image statistics obey invariances.

Page 27: Deep Learning and Vision - Indico · • no biophysics y = f (X i w ix i + b) ... 2015 Inception v3 2nd 3.6% 2015 ResNet 1st 3.6% ... X Zhang, S Ren, J Sun (2015) Rethinking the Inception

Translation invariance —> convolutions

• Models of natural image statistics begin with convolutional filter bank.

Page 28: Deep Learning and Vision - Indico · • no biophysics y = f (X i w ix i + b) ... 2015 Inception v3 2nd 3.6% 2015 ResNet 1st 3.6% ... X Zhang, S Ren, J Sun (2015) Rethinking the Inception

interlude for convolutions

Page 29: Deep Learning and Vision - Indico · • no biophysics y = f (X i w ix i + b) ... 2015 Inception v3 2nd 3.6% 2015 ResNet 1st 3.6% ... X Zhang, S Ren, J Sun (2015) Rethinking the Inception

original

https://docs.gimp.org/en/plug-in-convmatrix.html

filter (3 x 3) identity

0 0 0

0 1 0

0 0 0

Page 30: Deep Learning and Vision - Indico · • no biophysics y = f (X i w ix i + b) ... 2015 Inception v3 2nd 3.6% 2015 ResNet 1st 3.6% ... X Zhang, S Ren, J Sun (2015) Rethinking the Inception

original

https://docs.gimp.org/en/plug-in-convmatrix.html

filter (5 x 5) blur

Page 31: Deep Learning and Vision - Indico · • no biophysics y = f (X i w ix i + b) ... 2015 Inception v3 2nd 3.6% 2015 ResNet 1st 3.6% ... X Zhang, S Ren, J Sun (2015) Rethinking the Inception

original

https://docs.gimp.org/en/plug-in-convmatrix.html

filter (5 x 5) sharpen

Page 32: Deep Learning and Vision - Indico · • no biophysics y = f (X i w ix i + b) ... 2015 Inception v3 2nd 3.6% 2015 ResNet 1st 3.6% ... X Zhang, S Ren, J Sun (2015) Rethinking the Inception

original

https://docs.gimp.org/en/plug-in-convmatrix.html

filter (3 x 3) vertical edge detector

Page 33: Deep Learning and Vision - Indico · • no biophysics y = f (X i w ix i + b) ... 2015 Inception v3 2nd 3.6% 2015 ResNet 1st 3.6% ... X Zhang, S Ren, J Sun (2015) Rethinking the Inception

original

https://docs.gimp.org/en/plug-in-convmatrix.html

filter (3 x 3) all edge detector

Page 34: Deep Learning and Vision - Indico · • no biophysics y = f (X i w ix i + b) ... 2015 Inception v3 2nd 3.6% 2015 ResNet 1st 3.6% ... X Zhang, S Ren, J Sun (2015) Rethinking the Inception

interlude for convolutions

Page 35: Deep Learning and Vision - Indico · • no biophysics y = f (X i w ix i + b) ... 2015 Inception v3 2nd 3.6% 2015 ResNet 1st 3.6% ... X Zhang, S Ren, J Sun (2015) Rethinking the Inception

Multi-layer perceptron on MNIST.

• Note that weights grow as the square of the number of pixels!

“4”

handwritten zip codes

P=28

logistic classifier (M=10)

fully connected (N=100)

# weights = N x M = 1000

# weights = N x P2 = 78400

Page 36: Deep Learning and Vision - Indico · • no biophysics y = f (X i w ix i + b) ... 2015 Inception v3 2nd 3.6% 2015 ResNet 1st 3.6% ... X Zhang, S Ren, J Sun (2015) Rethinking the Inception

Convolutional neural network on MNIST.

• Note that the number of model parameters is largely independent of image size.

“4”

handwritten zip codes

P=28

logistic classifier (M=10)

convolutional (N=100)

# weights = N x M x K= 1000 K

# weights = N x F2 = 2500

F=5

F=5

N=100

Page 37: Deep Learning and Vision - Indico · • no biophysics y = f (X i w ix i + b) ... 2015 Inception v3 2nd 3.6% 2015 ResNet 1st 3.6% ... X Zhang, S Ren, J Sun (2015) Rethinking the Inception

Generalizing convolutions in depth.

filter bank output activationsinput activationsexample

grayscale image

input depth

input depthRGB image

Page 38: Deep Learning and Vision - Indico · • no biophysics y = f (X i w ix i + b) ... 2015 Inception v3 2nd 3.6% 2015 ResNet 1st 3.6% ... X Zhang, S Ren, J Sun (2015) Rethinking the Inception

Generalizing convolutions in depth.

filter bank output activationsinput activationsoutput depth

• input and output depth are arbitrary parameters and not equal. • Convolutional neural networks operate with depths up to 1024.

example

edge detector filter bank

output depth

output depth

convolutional network

output depth

Page 39: Deep Learning and Vision - Indico · • no biophysics y = f (X i w ix i + b) ... 2015 Inception v3 2nd 3.6% 2015 ResNet 1st 3.6% ... X Zhang, S Ren, J Sun (2015) Rethinking the Inception

The first convolutional neural network.

Backpropagation applied to handwritten zip code recognitionY LeCun et al (1989)

“4”

convolutional (N=12)

fully connected (N=30)

logistic classifier (M=10)

convolutional (N=12)

Page 40: Deep Learning and Vision - Indico · • no biophysics y = f (X i w ix i + b) ... 2015 Inception v3 2nd 3.6% 2015 ResNet 1st 3.6% ... X Zhang, S Ren, J Sun (2015) Rethinking the Inception

• Similar architecture to original CNN architecture but deeper and larger (70K —> 60M parameters).

• More nonlinearities and regularization.

Convolutional neural networks, revisited

ImageNet Classification with Deep Convolutional Neural Networks A Krizhevsky I Sutskever, G Hinton (2012)

Backpropagation applied to handwritten zip code recognitionY LeCun et al (1990)

Page 41: Deep Learning and Vision - Indico · • no biophysics y = f (X i w ix i + b) ... 2015 Inception v3 2nd 3.6% 2015 ResNet 1st 3.6% ... X Zhang, S Ren, J Sun (2015) Rethinking the Inception

Steady progress in network architectures.

place top 5 error

2012 Supervision 1st 16.4%2013 Clarifai 1st 11.5%2014 VGG 2nd 7.3%2014 GoogLeNet / Inception 1st 6.6%2014 Andrej Karpathy n/a 5.1%2015 Batch Normalization Inception n/a 4.8%2015 Inception v3 2nd 3.6%2015 ResNet 1st 3.6%2016 Inception-ResNet n/a 3.1%

Page 42: Deep Learning and Vision - Indico · • no biophysics y = f (X i w ix i + b) ... 2015 Inception v3 2nd 3.6% 2015 ResNet 1st 3.6% ... X Zhang, S Ren, J Sun (2015) Rethinking the Inception

Steady progress in network architectures.

place top 5 error

2012 Supervision 1st 16.4%2013 Clarifai 1st 11.5%2014 VGG 2nd 7.3%2014 GoogLeNet / Inception 1st 6.6%2014 Andrej Karpathy n/a 5.1%2015 Batch Normalization Inception n/a 4.8%2015 Inception v3 2nd 3.6%2015 ResNet 1st 3.6%2016 Inception-ResNet n/a 3.1%

Page 43: Deep Learning and Vision - Indico · • no biophysics y = f (X i w ix i + b) ... 2015 Inception v3 2nd 3.6% 2015 ResNet 1st 3.6% ... X Zhang, S Ren, J Sun (2015) Rethinking the Inception

Steady progress in network architectures.

place top 5 error

2012 Supervision 1st 16.4%2013 Clarifai 1st 11.5%2014 VGG 2nd 7.3%2014 GoogLeNet / Inception 1st 6.6%2014 Andrej Karpathy n/a 5.1%2015 Batch Normalization Inception n/a 4.8%2015 Inception v3 2nd 3.6%2015 ResNet 1st 3.6%2016 Inception-ResNet n/a 3.1%

Page 44: Deep Learning and Vision - Indico · • no biophysics y = f (X i w ix i + b) ... 2015 Inception v3 2nd 3.6% 2015 ResNet 1st 3.6% ... X Zhang, S Ren, J Sun (2015) Rethinking the Inception

Advances in network architectures

Animation by Dan Mané

Page 45: Deep Learning and Vision - Indico · • no biophysics y = f (X i w ix i + b) ... 2015 Inception v3 2nd 3.6% 2015 ResNet 1st 3.6% ... X Zhang, S Ren, J Sun (2015) Rethinking the Inception

Inception-v4, Inception-ResNet and the Impact of Residual Connections on LearningC Szegedy, S Ioffe, V Vanhoucke (2016)

Deep Residual Learning for Image RecognitionK He, X Zhang, S Ren, J Sun (2015)

Rethinking the Inception Architecture for Computer VisionC Szegedy, V Vanhoucke, S Ioffe, J Shlens, Z Wojna (2015)

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate ShiftS Ioffe and C Szegedy (2015)

What I learned from competing against a ConvNet on ImageNetA Karpathy (2014)

Very Deep Convolutional Networks for Large-scale Image RecognitionKaren Simonyan and Andrew Zisserman (2015)

Going Deeper with ConvolutionsC Szegedy et al (2014)

Visualizing and Understanding Convolutional NetworksM Zeiler and R Fergus (2013)

ImageNet Classification with Deep Convolutional Neural Networks A Krizhevsky I Sutskever, G Hinton (2012)

Scalable Multiclass Object Categorization with Fisher Based FeaturesN. Gunji et al, (2012)

Compressed Fisher vectors for Large Scale Visual RecognitionF Perronnin, J Sanchez (2011)

Page 46: Deep Learning and Vision - Indico · • no biophysics y = f (X i w ix i + b) ... 2015 Inception v3 2nd 3.6% 2015 ResNet 1st 3.6% ... X Zhang, S Ren, J Sun (2015) Rethinking the Inception

1. A brief history and motivation

2. Deep learning for vision

• What is deep learning?

• Convolutions and neural networks

3. Advances in neural networks

• Nonlinearities: example of batch normalization

• Understanding: example of gradient propagation

4. Conclusions

Agenda

Page 47: Deep Learning and Vision - Indico · • no biophysics y = f (X i w ix i + b) ... 2015 Inception v3 2nd 3.6% 2015 ResNet 1st 3.6% ... X Zhang, S Ren, J Sun (2015) Rethinking the Inception

• Traditional machine learning must contend with covariate shift between data sets.

• Covariate shifts must be mitigates through domain adaptation.

Covariate shifts are problematic in machine learning

blog.bigml.com

Page 48: Deep Learning and Vision - Indico · • no biophysics y = f (X i w ix i + b) ... 2015 Inception v3 2nd 3.6% 2015 ResNet 1st 3.6% ... X Zhang, S Ren, J Sun (2015) Rethinking the Inception

• Traditional machine learning must contend with covariate shift between data sets.

• Covariate shifts must be mitigates through domain adaptation.

layer i

time = 1

time = N

time = 1

time = N

Covariate shifts occur between network layers.

Page 49: Deep Learning and Vision - Indico · • no biophysics y = f (X i w ix i + b) ... 2015 Inception v3 2nd 3.6% 2015 ResNet 1st 3.6% ... X Zhang, S Ren, J Sun (2015) Rethinking the Inception

• Covariate shifts occur across layers in a deep network.

• Performing domain adaptation or whitening is impractical in an online setting.

Covariate shifts occur between network layers.

logistic unit activation during MNIST training

time

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate ShiftS Ioffe and C Szegedy (2015)

50%

85%

15%

Page 50: Deep Learning and Vision - Indico · • no biophysics y = f (X i w ix i + b) ... 2015 Inception v3 2nd 3.6% 2015 ResNet 1st 3.6% ... X Zhang, S Ren, J Sun (2015) Rethinking the Inception

• Adagrad

• whitening input data

• building invariances through normalization

• regularizing the network (e.g. dropout, maxout)

I Goodfellow et al (2013) N Srivastava et al. (2014)

layer i

time = 1

time = N

time = 1

time = N

Previous method for addressing covariate shifts

Page 51: Deep Learning and Vision - Indico · • no biophysics y = f (X i w ix i + b) ... 2015 Inception v3 2nd 3.6% 2015 ResNet 1st 3.6% ... X Zhang, S Ren, J Sun (2015) Rethinking the Inception

1. Normalize the activations within a mini-batch.

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate ShiftS Ioffe and C Szegedy (2015)

Mitigate covariate shift via batch normalization.

(�,�)

{xi}

µ =1

n

nX

i

{xi}

2 =1

n

nX

i

(xi � µ)2xi =

xi � µp�

2 + ✏

yi = �xi + �

2. Learn the mean and variance of each layer as parameters

{xi}

Page 52: Deep Learning and Vision - Indico · • no biophysics y = f (X i w ix i + b) ... 2015 Inception v3 2nd 3.6% 2015 ResNet 1st 3.6% ... X Zhang, S Ren, J Sun (2015) Rethinking the Inception

• The canonical module of a perceptron is updated:

• Activations are more stable over training.

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate ShiftS Ioffe and C Szegedy (2015)

Batch normalization stabilizes training.

50%

85%

15%

hidden layer activations on MNIST

y = f(BatchNorm(X

i

wixi) )y = f(X

i

wixi + b)

Page 53: Deep Learning and Vision - Indico · • no biophysics y = f (X i w ix i + b) ... 2015 Inception v3 2nd 3.6% 2015 ResNet 1st 3.6% ... X Zhang, S Ren, J Sun (2015) Rethinking the Inception

• CNN’s train faster with fewer data samples (15x).

• Employ faster learning rates and less network regularizations.

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate ShiftS Ioffe and C Szegedy (2015)

Batch normalization speeds up training enormously.

number of mini-batches

prec

isio

n @

1

Page 54: Deep Learning and Vision - Indico · • no biophysics y = f (X i w ix i + b) ... 2015 Inception v3 2nd 3.6% 2015 ResNet 1st 3.6% ... X Zhang, S Ren, J Sun (2015) Rethinking the Inception

1. A brief history and motivation

2. Deep learning for vision

• What is deep learning?

• Convolutions and neural networks

3. Advances in neural networks

• Nonlinearities: example of batch normalization

• Understanding: example of gradient propagation

4. Conclusions

Agenda

Page 55: Deep Learning and Vision - Indico · • no biophysics y = f (X i w ix i + b) ... 2015 Inception v3 2nd 3.6% 2015 ResNet 1st 3.6% ... X Zhang, S Ren, J Sun (2015) Rethinking the Inception

For training a network, one focused on how to change parameters with respect to a loss function.

Switching to other types of gradients

An important distinction: • the former provides an update that “lives” in weight space • the latter provides an update that “lives” in image space

The rest of this talk is instead focused on how does an activation or loss function depend on the image.

@

@ wi

@

@ image

Page 56: Deep Learning and Vision - Indico · • no biophysics y = f (X i w ix i + b) ... 2015 Inception v3 2nd 3.6% 2015 ResNet 1st 3.6% ... X Zhang, S Ren, J Sun (2015) Rethinking the Inception

layer 3 layer 5

Gradient propagation to find responsible pixels

• Which pixels elicit large activation values within an image?

• Examine activations at middle layers in a trained network.

Page 57: Deep Learning and Vision - Indico · • no biophysics y = f (X i w ix i + b) ... 2015 Inception v3 2nd 3.6% 2015 ResNet 1st 3.6% ... X Zhang, S Ren, J Sun (2015) Rethinking the Inception

Visualizing and Understanding Convolutional NetworksM Zeiler and R Fergus (2013)

layer 3

Gradient propagation to find responsible pixels

Page 58: Deep Learning and Vision - Indico · • no biophysics y = f (X i w ix i + b) ... 2015 Inception v3 2nd 3.6% 2015 ResNet 1st 3.6% ... X Zhang, S Ren, J Sun (2015) Rethinking the Inception

Visualizing and Understanding Convolutional NetworksM Zeiler and R Fergus (2013)

layer 5

Gradient propagation to find responsible pixels

Page 59: Deep Learning and Vision - Indico · • no biophysics y = f (X i w ix i + b) ... 2015 Inception v3 2nd 3.6% 2015 ResNet 1st 3.6% ... X Zhang, S Ren, J Sun (2015) Rethinking the Inception

Gradient propagation for distorting images.

“dog”

Inception-v3

http://mscoco.org

• What happens if we distort the original image to amplify the label using the gradient signal?

Page 60: Deep Learning and Vision - Indico · • no biophysics y = f (X i w ix i + b) ... 2015 Inception v3 2nd 3.6% 2015 ResNet 1st 3.6% ... X Zhang, S Ren, J Sun (2015) Rethinking the Inception

Inceptionism: Going Deeper into Neural NetworksA. Mordvintsev, C. Olah and M. Tyka (2015)

“dog”

Gradient propagation for distorting images.

…. But if we used the wrong image?

• What happens if we distort the original image to amplify the label using the gradient signal?

Page 61: Deep Learning and Vision - Indico · • no biophysics y = f (X i w ix i + b) ... 2015 Inception v3 2nd 3.6% 2015 ResNet 1st 3.6% ... X Zhang, S Ren, J Sun (2015) Rethinking the Inception

Inceptionism: Going Deeper into Neural NetworksA. Mordvintsev, C. Olah and M. Tyka (2015)

“dog”

• Apply gradient distortion, feed back the distorted image into the network and iterate.

Gradient propagation for distorting images.

Page 62: Deep Learning and Vision - Indico · • no biophysics y = f (X i w ix i + b) ... 2015 Inception v3 2nd 3.6% 2015 ResNet 1st 3.6% ... X Zhang, S Ren, J Sun (2015) Rethinking the Inception

http://googleresearch.blogspot.com/2015/06/inceptionism-going-deeper-into-neural.html

Page 63: Deep Learning and Vision - Indico · • no biophysics y = f (X i w ix i + b) ... 2015 Inception v3 2nd 3.6% 2015 ResNet 1st 3.6% ... X Zhang, S Ren, J Sun (2015) Rethinking the Inception

http://googleresearch.blogspot.com/2015/06/inceptionism-going-deeper-into-neural.html

A. Mordvintsev, C. Olah and M. Tyka

Page 64: Deep Learning and Vision - Indico · • no biophysics y = f (X i w ix i + b) ... 2015 Inception v3 2nd 3.6% 2015 ResNet 1st 3.6% ... X Zhang, S Ren, J Sun (2015) Rethinking the Inception

A Neural Algorithm for Artistic Style

A Neural Algorithm of Artistic StyleL. Gatys, A. Ecker, M. Bethge (2015)

Page 65: Deep Learning and Vision - Indico · • no biophysics y = f (X i w ix i + b) ... 2015 Inception v3 2nd 3.6% 2015 ResNet 1st 3.6% ... X Zhang, S Ren, J Sun (2015) Rethinking the Inception

https://github.com/kaishengtai/neuralart

A Neural Algorithm of Artistic StyleL. Gatys, A. Ecker, M. Bethge (2015)

Page 66: Deep Learning and Vision - Indico · • no biophysics y = f (X i w ix i + b) ... 2015 Inception v3 2nd 3.6% 2015 ResNet 1st 3.6% ... X Zhang, S Ren, J Sun (2015) Rethinking the Inception

Gradient propagation for breaking things.

“dog”

Inception-v3

@ loss

@ image

Intriguing properties of neural networksC Szegedy et al (2014)

Explaining and Harnessing Adversarial ExamplesI Goodfellow, J Shlens and C Szegedy (2015)

@ loss

@ image

-

which pixels are sensitive to the label

how to change pixels to decrease the probability of the label

Page 67: Deep Learning and Vision - Indico · • no biophysics y = f (X i w ix i + b) ... 2015 Inception v3 2nd 3.6% 2015 ResNet 1st 3.6% ... X Zhang, S Ren, J Sun (2015) Rethinking the Inception

• Constrained optimization to find adversarial adjustment to an image (L1 norm).

• Robust across trained networks, network architectures and other machine learning systems.

Intriguing properties of neural networksC Szegedy et al (2014)

Explaining and Harnessing Adversarial ExamplesI Goodfellow, J Shlens and C Szegedy (2015)

Gradient propagation for breaking things.

Page 68: Deep Learning and Vision - Indico · • no biophysics y = f (X i w ix i + b) ... 2015 Inception v3 2nd 3.6% 2015 ResNet 1st 3.6% ... X Zhang, S Ren, J Sun (2015) Rethinking the Inception

1. A brief history and motivation

2. Deep learning for vision

• What is deep learning?

• Convolutions and neural networks

3. Advances in neural networks

• Nonlinearities: example of batch normalization

• Understanding: example of gradient propagation

4. Conclusions

Agenda

Page 69: Deep Learning and Vision - Indico · • no biophysics y = f (X i w ix i + b) ... 2015 Inception v3 2nd 3.6% 2015 ResNet 1st 3.6% ... X Zhang, S Ren, J Sun (2015) Rethinking the Inception

Quick Start Guide

1. Purchase a desktop with a fast GPU.

2. Download an open-source library for deep learning.

3. Download a pre-trained model a similar vision task.

4. Retrain (fine-tune) the network for your particular data set.

Online resources: http://www.tensorflow.org http://cs231n.github.io/convolutional-networks/

Page 70: Deep Learning and Vision - Indico · • no biophysics y = f (X i w ix i + b) ... 2015 Inception v3 2nd 3.6% 2015 ResNet 1st 3.6% ... X Zhang, S Ren, J Sun (2015) Rethinking the Inception

Google Brain Residency Program

One year immersion program in deep learning research● First class started six weeks ago, planning for next year’s class is underway

Learn to conduct deep learning research w/experts in our team● Fixed one-year employment with salary, benefits, ...

● Goal after one year is to have conducted several research projects

● Interesting problems, TensorFlow, and access to computational resources

g.co/brainresidency

Page 71: Deep Learning and Vision - Indico · • no biophysics y = f (X i w ix i + b) ... 2015 Inception v3 2nd 3.6% 2015 ResNet 1st 3.6% ... X Zhang, S Ren, J Sun (2015) Rethinking the Inception

Google Brain Residency Program

Who should apply? ● people with BSc, MSc or PhD, ideally in CS, mathematics or statistics

● completed coursework in calculus, linear algebra, and probability, or equiv.

● programming experience

● motivated, hard working, and have a strong interest in deep learning

g.co/brainresidency

Page 72: Deep Learning and Vision - Indico · • no biophysics y = f (X i w ix i + b) ... 2015 Inception v3 2nd 3.6% 2015 ResNet 1st 3.6% ... X Zhang, S Ren, J Sun (2015) Rethinking the Inception