computer vision lecture 15...g/16 topics of this lecture •deep learning motivation...

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Co

mp

ute

r V

isio

n W

S 1

6/1

7

Computer Vision – Lecture 15

Deep Learning for Object Categorization

21.12.2016

Bastian Leibe

RWTH Aachen

http://www.vision.rwth-aachen.de

[email protected]

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Co

mp

ute

r V

isio

n W

S 1

6/1

7

Course Outline

• Image Processing Basics

• Segmentation & Grouping

• Object Recognition

• Object Categorization I

Sliding Window based Object Detection

• Local Features & Matching

Local Features – Detection and Description

Recognition with Local Features

Indexing & Visual Vocabularies

• Object Categorization II

Bag-of-Words Approaches & Part-based Approaches

Deep Learning Methods

• 3D Reconstruction 3

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Co

mp

ute

r V

isio

n W

S 1

5/1

6

4B. Leibe

Recap: Part-Based Models

• Fischler & Elschlager 1973

• Model has two components

parts

(2D image fragments)

structure

(configuration of parts)

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Co

mp

ute

r V

isio

n W

S 1

5/1

6

5B. Leibe

Recap: Implicit Shape Model – Representation

• Learn appearance codebook

Extract local features at interest points

Clustering appearance codebook

• Learn spatial distributions

Match codebook to training images

Record matching positions on object

Training images

(+reference segmentation)

Appearance codebook…

………

…

Spatial occurrence distributionsx

y

sx

y

s

x

y

s

x

y

s

+ local figure-ground labels

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Co

mp

ute

r V

isio

n W

S 1

5/1

6

Recap: Deformable Part-Based Model

6B. Leibe

Root filters

coarse resolution

Part filters

finer resolution

Deformation

models

Slide credit: Pedro Felzenszwalb

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Co

mp

ute

r V

isio

n W

S 1

5/1

6

Recap: Object Hypothesis

• Multiscale model captures features at two resolutions

7B. Leibe

Score of object

hypothesis is sum of

filter scores minus

deformation costs

Score of filter:

dot product of filter

with HOG features

underneath it

Slide credit: Pedro Felzenszwalb

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Co

mp

ute

r V

isio

n W

S 1

5/1

6

Recap: Score of a Hypothesis

8B. LeibeSlide credit: Pedro Felzenszwalb

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Co

mp

ute

r V

isio

n W

S 1

5/1

6

Topics of This Lecture

• Deep Learning Motivation

• Convolutional Neural Networks Convolutional Layers

Pooling Layers

Nonlinearities

• CNN Architectures LeNet

AlexNet

VGGNet

GoogLeNet

• Applications

9B. Leibe

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Co

mp

ute

r V

isio

n W

S 1

5/1

6

We’ve finally got there!

10B. Leibe

Deep Learning

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Co

mp

ute

r V

isio

n W

S 1

5/1

6

Traditional Recognition Approach

• Characteristics

Features are not learned, but engineered

Trainable classifier is often generic (e.g., SVM)

Many successes in 2000-2010.

11B. LeibeSlide credit: Svetlana Lazebnik

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Co

mp

ute

r V

isio

n W

S 1

5/1

6

Traditional Recognition Approach

• Features are key to recent progress in recognition

Multitude of hand-designed features currently in use

SIFT, HOG, ………….

Where next? Better classifiers? Or keep building more features?

12

DPM

[Felzenszwalb

et al., PAMI’07]

Dense SIFT+LBP+HOG BOW Classifier

[Yan & Huan ‘10]

(Winner of PASCAL 2010 Challenge)

Slide credit: Svetlana Lazebnik

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Co

mp

ute

r V

isio

n W

S 1

5/1

6

What About Learning the Features?

• Learn a feature hierarchy all the way from pixels to

classifier

Each layer extracts features from the output of previous layer

Train all layers jointly


Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Co

mp

ute

r V

isio

n W

S 1

5/1

6

“Shallow” vs. “Deep” Architectures


Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Co

mp

ute

r V

isio

n W

S 1

5/1

6

Background: Perceptrons

15Slide credit: Svetlana Lazebnik

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Co

mp

ute

r V

isio

n W

S 1

5/1

6

Inspiration: Neuron Cells

16Slide credit: Svetlana Lazebnik, Rob Fergus

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Co

mp

ute

r V

isio

n W

S 1

5/1

6

Background: Multi-Layer Neural Networks

• Nonlinear classifier

Training: find network weights w to minimize the error between

true training labels tn and estimated labels fw(xn):

Minimization can be done by gradient descent, provided f is

differentiable

– Training method: error backpropagation.


Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Co

mp

ute

r V

isio

n W

S 1

5/1

6

Hubel/Wiesel Architecture

• D. Hubel, T. Wiesel (1959, 1962, Nobel Prize 1981)

Visual cortex consists of a hierarchy of simple, complex, and

hyper-complex cells

18B. LeibeSlide credit: Svetlana Lazebnik, Rob Fergus

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Co

mp

ute

r V

isio

n W

S 1

5/1

6

Convolutional Neural Networks (CNN, ConvNet)

• Neural network with specialized connectivity structure

Stack multiple stages of feature extractors

Higher stages compute more global, more invariant features

Classification layer at the end

19B. Leibe

Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, Gradient-based learning applied to

document recognition, Proceedings of the IEEE 86(11): 2278–2324, 1998.


http://yann.lecun.com/exdb/publis/pdf/lecun-98.pdf


Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Co

mp

ute

r V

isio

n W

S 1

5/1

6




Pooling Layers

Nonlinearities


AlexNet

VGGNet

GoogLeNet

• Applications

20B. Leibe

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Co

mp

ute

r V

isio

n W

S 1

5/1

6

Convolutional Networks: Structure

• Feed-forward feature extraction

1. Convolve input with learned filters

2. Non-linearity

3. Spatial pooling

4. (Normalization)

• Supervised training of convolutional

filters by back-propagating

classification error


Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Co

mp

ute

r V

isio

n W

S 1

5/1

6

Convolutional Networks: Intuition

• Fully connected network

E.g. 1000£1000 image

1M hidden units

1T parameters!

• Ideas to improve this

Spatial correlation is local

22B. Leibe Image source: Yann LeCunSlide adapted from Marc’Aurelio Ranzato

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Co

mp

ute

r V

isio

n W

S 1

5/1

6


• Locally connected net

E.g. 1000£1000 image

1M hidden units10£10 receptive fields

100M parameters!

• Ideas to improve this

Spatial correlation is local

Want translation invariance


Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Co

mp

ute

r V

isio

n W

S 1

5/1

6


• Convolutional net

Share the same parameters

across different locations

Convolutions with learned

kernels


Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Co

mp

ute

r V

isio

n W

S 1

5/1

6


• Convolutional net

Share the same parameters

across different locations

Convolutions with learned

kernels

• Learn multiple filters

E.g. 1000£1000 image

100 filters10£10 filter size

10k parameters

• Result: Response map

size: 1000£1000£100

Only memory, not params!25

B. Leibe Image source: Yann LeCunSlide adapted from Marc’Aurelio Ranzato

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Co

mp

ute

r V

isio

n W

S 1

5/1

6

Important Conceptual Shift

• Before

• Now:

26B. LeibeSlide credit: FeiFei Li, Andrej Karpathy

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Co

mp

ute

r V

isio

n W

S 1

5/1

6

Convolution Layers

• Note: Connectivity is

Local in space (5£5 inside 32£32)

But full in depth (all 3 depth channels)

27B. LeibeSlide adapted from FeiFei Li, Andrej Karpathy

Hidden neuron

in next layer

Exampleimage: 32£32£3 volume

Before: Full connectivity32£32£3 weights

Now: Local connectivity

One neuron connects to, e.g.,5£5£3 region.

Only 5£5£3 shared weights.

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Co

mp

ute

r V

isio

n W

S 1

5/1

6

Convolution Layers

• All Neural Net activations arranged in 3 dimensions

Multiple neurons all looking at the same input region,

stacked in depth


Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Co

mp

ute

r V

isio

n W

S 1

5/1

6

Convolution Layers

• All Neural Net activations arranged in 3 dimensions

Multiple neurons all looking at the same input region,

stacked in depth

Form a single [1£1£depth] depth column in output volume.


Naming convention:

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Co

mp

ute

r V

isio

n W

S 1

5/1

6

Convolution Layers

• Replicate this column of hidden neurons across space,

with some stride.


Example:7£7 input

assume 3£3 connectivity

stride 1

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Co

mp

ute

r V

isio

n W

S 1

5/1

6

Convolution Layers


with some stride.


Example:7£7 input


stride 1

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Co

mp

ute

r V

isio

n W

S 1

5/1

6

Convolution Layers


with some stride.


Example:7£7 input


stride 1 5£5 output

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Co

mp

ute

r V

isio

n W

S 1

5/1

6

Convolution Layers


with some stride.


Example:7£7 input



What about stride 2?

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Co

mp

ute

r V

isio

n W

S 1

5/1

6

Convolution Layers


with some stride.


Example:7£7 input



What about stride 2? 3£3 output

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Co

mp

ute

r V

isio

n W

S 1

5/1

6

Convolution Layers


with some stride.

• In practice, common to zero-pad the border.

Preserves the size of the input spatially.


Example:7£7 input



What about stride 2? 3£3 output

0

0

0 00 0

0

0

0

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Co

mp

ute

r V

isio

n W

S 1

5/1

6

Activation Maps of Convolutional Filters

39B. Leibe

5£5 filters

Slide adapted from FeiFei Li, Andrej Karpathy

Activation maps

Each activation map is a depth

slice through the output volume.

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Co

mp

ute

r V

isio

n W

S 1

5/1

6

Effect of Multiple Convolution Layers

40B. LeibeSlide credit: Yann LeCun

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Co

mp

ute

r V

isio

n W

S 1

5/1

6

Commonly Used Nonlinearities

• Sigmoid

• Hyperbolic tangent

• Rectified linear unit (ReLU)

41B. Leibe

Preferred option for deep networks

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Co

mp

ute

r V

isio

n W

S 1

5/1

6


• Let’s assume the filter is

an eye detector

How can we make the

detection robust to the

exact location of the eye?


Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Co

mp

ute

r V

isio

n W

S 1

5/1

6


• Let’s assume the filter is

an eye detector

How can we make the

detection robust to the

exact location of the eye?

• Solution:

By pooling (e.g., max or avg)

filter responses at different

spatial locations, we gain

robustness to the exact

spatial location of features.


Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Co

mp

ute

r V

isio

n W

S 1

5/1

6

Max Pooling

• Effect:

Make the representation smaller without losing too much

information

Achieve robustness to translations44

B. LeibeSlide adapted from FeiFei Li, Andrej Karpathy

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Co

mp

ute

r V

isio

n W

S 1

5/1

6

Max Pooling

• Note

Pooling happens independently across each slice, preserving the

number of slices.


Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Co

mp

ute

r V

isio

n W

S 1

5/1

6

Compare: SIFT Descriptor


Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Co

mp

ute

r V

isio

n W

S 1

5/1

6

Compare: Spatial Pyramid Matching


Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Co

mp

ute

r V

isio

n W

S 1

5/1

6




Pooling Layers

Nonlinearities


AlexNet

VGGNet

GoogLeNet

• Applications

48B. Leibe

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Co

mp

ute

r V

isio

n W

S 1

5/1

6

CNN Architectures: LeNet (1998)

• Early convolutional architecture

2 Convolutional layers, 2 pooling layers

Fully-connected NN layers for classification

Successfully used for handwritten digit recognition (MNIST)

49B. Leibe

Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, Gradient-based learning applied to

document recognition, Proceedings of the IEEE 86(11): 2278–2324, 1998.




Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Co

mp

ute

r V

isio

n W

S 1

5/1

6

ImageNet Challenge 2012

• ImageNet

~14M labeled internet images

20k classes

Human labels via Amazon

Mechanical Turk

• Challenge (ILSVRC)

1.2 million training images

1000 classes

Goal: Predict ground-truth

class within top-5 responses

Currently one of the top benchmarks in Computer Vision

50B. Leibe

[Deng et al., CVPR’09]

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Co

mp

ute

r V

isio

n W

S 1

5/1

6

CNN Architectures: AlexNet (2012)

• Similar framework as LeNet, but

Bigger model (7 hidden layers, 650k units, 60M parameters)

More data (106 images instead of 103)

GPU implementation

Better regularization and up-to-date tricks for training (Dropout)

51Image source: A. Krizhevsky, I. Sutskever and G.E. Hinton, NIPS 2012

A. Krizhevsky, I. Sutskever, and G. Hinton, ImageNet Classification with Deep

Convolutional Neural Networks, NIPS 2012.

http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf


Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Co

mp

ute

r V

isio

n W

S 1

5/1

6

ILSVRC 2012 Results

• AlexNet almost halved the error rate

16.4% error (top-5) vs. 26.2% for the next best approach

A revolution in Computer Vision

Acquired by Google in Jan ‘13, deployed in Google+ in May ‘1352

B. Leibe

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Co

mp

ute

r V

isio

n W

S 1

5/1

6

AlexNet Results

53B. Leibe

Image source: A. Krizhevsky, I. Sutskever and G.E. Hinton, NIPS 2012

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Co

mp

ute

r V

isio

n W

S 1

5/1

6

AlexNet Results

54Image source: A. Krizhevsky, I. Sutskever and G.E. Hinton, NIPS 2012

Test image Retrieved images

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Co

mp

ute

r V

isio

n W

S 1

5/1

6

CNN Architectures: VGGNet (2014/15)

55B. Leibe

Image source: Hirokatsu Kataoka

K. Simonyan, A. Zisserman, Very Deep Convolutional Networks for Large-Scale

Image Recognition, ICLR 2015

http://arxiv.org/pdf/1409.1556

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Co

mp

ute

r V

isio

n W

S 1

5/1

6

CNN Architectures: VGGNet (2014/15)

• Main ideas

Deeper network

Stacked convolutional

layers with smaller

filters (+ nonlinearity)

Detailed evaluation

of all components

• Results

Improved ILSVRC top-5

error rate to 6.7%.

56B. Leibe

Image source: Simonyan & Zisserman

Mainly used

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Co

mp

ute

r V

isio

n W

S 1

5/1

6

Comparison: AlexNet vs. VGGNet

• Receptive fields in the first layer

AlexNet: 11£11, stride 4

Zeiler & Fergus: 7£7, stride 2

VGGNet: 3£3, stride 1

• Why that?

If you stack three 3£3 on top of another 3£3 layer, you

effectively get a 5£5 receptive field.

With three 3£3 layers, the receptive field is already 7£7.

But much fewer parameters: 3¢32 = 27 instead of 72 = 49.

In addition, non-linearities in-between 3£3 layers for additional

discriminativity.

57B. Leibe

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Co

mp

ute

r V

isio

n W

S 1

5/1

6

CNN Architectures: GoogLeNet (2014)

• Main ideas

“Inception” module as modular component

Learns filters at several scales within each module

58B. Leibe

C. Szegedy, W. Liu, Y. Jia, et al, Going Deeper with Convolutions,

arXiv:1409.4842, 2014.

http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/43022.pdf

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Co

mp

ute

r V

isio

n W

S 1

5/1

6

GoogLeNet Visualization

59B. Leibe

Inception

module+ copies

Auxiliary classification

outputs for training the

lower layers (deprecated)

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Co

mp

ute

r V

isio

n W

S 1

5/1

6

Results on ILSVRC

• VGGNet and GoogLeNet perform at similar level

Comparison: human performance ~5% [Karpathy]

60B. Leibe

Image source: Simonyan & Zisserman

http://karpathy.github.io/2014/09/02/what-i-learned-from-competing-against-a-convnet-on-imagenet/

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Co

mp

ute

r V

isio

n W

S 1

5/1

6

Newest Development: Residual Networks

61B. Leibe

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Co

mp

ute

r V

isio

n W

S 1

5/1

6

Newest Development: Residual Networks

• Core component

Skip connections

bypassing each layer

Better propagation of

gradients to the deeper

layers

62B. Leibe

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Co

mp

ute

r V

isio

n W

S 1

5/1

6

ImageNet Performance

63B. Leibe

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Co

mp

ute

r V

isio

n W

S 1

5/1

6




Pooling Layers

Nonlinearities


AlexNet

VGGNet

GoogLeNet

• Applications

64B. Leibe

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Co

mp

ute

r V

isio

n W

S 1

5/1

6

The Learned Features are Generic

• Experiment: feature transfer

Train network on ImageNet

Chop off last layer and train classification layer on CalTech256

State of the art accuracy already with only 6 training images65

B. LeibeImage source: M. Zeiler, R. Fergus

state of the art

level (pre-CNN)

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Co

mp

ute

r V

isio

n W

S 1

5/1

6

Other Tasks: Detection

• Results on PASCAL VOC Detection benchmark

Pre-CNN state of the art: 35.1% mAP [Uijlings et al., 2013]

33.4% mAP DPM

R-CNN: 53.7% mAP

66

R. Girshick, J. Donahue, T. Darrell, and J. Malik, Rich Feature Hierarchies for

Accurate Object Detection and Semantic Segmentation, CVPR 2014

http://www.cs.berkeley.edu/~rbg/papers/r-cnn-cvpr.pdf

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Co

mp

ute

r V

isio

n W

S 1

5/1

6

Faster R-CNN (based on ResNets)

67B. Leibe

K. He, X. Zhang, S. Ren, J. Sun, Deep Residual Learning for Image Recognition,

CVPR 2016.

http://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/He_Deep_Residual_Learning_CVPR_2016_paper.pdf

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Co

mp

ute

r V

isio

n W

S 1

5/1

6

Faster R-CNN (based on ResNets)

68B. Leibe

K. He, X. Zhang, S. Ren, J. Sun, Deep Residual Learning for Image Recognition,

CVPR 2016.

http://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/He_Deep_Residual_Learning_CVPR_2016_paper.pdf

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Co

mp

ute

r V

isio

n W

S 1

5/1

6

Other Tasks: Semantic Segmentation

69B. Leibe

[Farabet et al. ICML 2012, PAMI 2013]

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Co

mp

ute

r V

isio

n W

S 1

5/1

6

Semantic Segmentation

• More recent results

Based on an extension of ResNets

[Pohlen, Hermans, Mathias, Leibe, arXiv 2016]

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Co

mp

ute

r V

isio

n W

S 1

5/1

6

Other Tasks: Face Verification

72

Y. Taigman, M. Yang, M. Ranzato, L. Wolf, DeepFace: Closing the Gap to Human-

Level Performance in Face Verification, CVPR 2014


https://www.cs.toronto.edu/~ranzato/publications/taigman_cvpr14.pdf

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Co

mp

ute

r V

isio

n W

S 1

5/1

6

Commercial Recognition Services

• E.g.,

• Be careful when taking test images from Google Search

Chances are they may have been seen in the training set...73

B. LeibeImage source: clarifai.com

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Co

mp

ute

r V

isio

n W

S 1

5/1

6

Commercial Recognition Services

74B. Leibe

Image source: clarifai.com

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Co

mp

ute

r V

isio

n W

S 1

5/1

6

References and Further Reading

• LeNet

Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, Gradient-based

learning applied to document recognition, Proceedings of the IEEE

86(11): 2278–2324, 1998.

• AlexNet

A. Krizhevsky, I. Sutskever, and G. Hinton, ImageNet Classification

with Deep Convolutional Neural Networks, NIPS 2012.

• VGGNet

K. Simonyan, A. Zisserman, Very Deep Convolutional Networks for

Large-Scale Image Recognition, ICLR 2015

• GoogLeNet

C. Szegedy, W. Liu, Y. Jia, et al, Going Deeper with Convolutions,

arXiv:1409.4842, 2014.

B. Leibe75



http://arxiv.org/pdf/1409.1556

http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/43022.pdf

computer vision lecture 15...g/16 topics of this lecture •deep learning motivation...

Documents