seminar rita osadchycs.haifa.ac.il/~rita/courses_moodle/object_recognition_seminar/intro.pdf ·...

Object Recognition

Seminar

Rita Osadchy

So what does object recognition involve?

http://people.w3.org/rishida/photos/html/slides/0311-beijing1_031111_035240+8_beijing_e031124.jpg.html


Verification: is that a bus?



Detection: locate the cars in the image



Identification: is that a picture of Mao?



Object categorization

sky

building

flag

wallbanner

bus

cars

bus

face

street lamp



Challenges 1: view point variation

Michelangelo 1475-1564

Challenges 2: illumination

slide credit: S. Ullman

Challenges 3: occlusion

Magritte, 1957

Challenges 4: scale

Challenges 5: deformation

Xu, Beihong 1943

Challenges 7: intra-class variation

Recognition Steps

Features Image

representation

Train Images

Preprocessing

Features Image

representation

Test Images

Recognition

classifier

labels

classifier output

Object Recognition System

conveyer belt

sorting

chamber

salmon

sea bass

camera

classifier

How to design a PR system? Collect data and classify by hand

salmon salmon salmonsea bass sea bass sea bass

Preprocess by segmenting fish from background

Extract possibly discriminating features length, lightness,width,number of fins,etc.

Classifier design

Choose model

Train classifier on part of collected data (training data)

Test classifier on the rest of collected data (test data) i.e. the data not used for training Should classify new data (new fish images) well

Interest Point Detectors

• Basic requirements:

– Sparse

– Informative

– Repeatable

• Invariance

– Rotation

– Scale (Similarity)

– Affine

Lecture 3

17

Learned models of local features, and got object outline from

background subtraction:

Objects may then be found under occlusion and 3D rotation

Images fro

m: D

avid

Low

e,

Ob

ject

Reco

gn

itio

n f

rom

Lo

cal S

cale

-In

vari

an

t F

eatu

res P

roc. of

the Inte

rnational C

onfe

rence o

n C

om

pute

r V

isio

n,C

orf

u (

Sept. 1

999)

Lecture 3

Recognizing Specific Objects

Bag of Features

=

Lecture 3

Pros: fairly flexible and computationally efficient

Cons: problems with large clutter

Different objects, but Similar

representations;Similar objects, different

representations;

Bag of FeaturesLecture 3

• Computing bags of features on sub-

windows of the whole image.

level 2

Lazebnik, Schmid & Ponce (CVPR 2006)

Beyond Bags of FeaturesLecture 3

Convolutional Neural Networks

• Learn all in one deep architecture:

– low level features

– high level representations

– context

– classifiers

• Efficient Classification

• Efficient Detection

• Scalable to very large sets and large number of

categories

Lecture 4

Krizhevsky, I. Sutskever, and G. Hinton. ImageNet classification with deep convolutional neural networks

Convolutional Neural Networks

Lecture 4

Training the Network

p1=0.03

p0=0.01

pN-1=0.02

pN=0.015

cat

banana

mug

dog

0

1

0

0

𝑋 𝑌

Compute the gradient with respect to the parameters of the

network.

Lecture 4

Very Deep NetworksLecture 4

Very Deep NetworksLecture 4

• K. Simonyan and A. Zisserman Very Deep Convolutional Networks for Large-Scale Image Recognition

• K. He, X. Zhang, S. Ren, and J. Sun: Deep Residual Learning for Image Recognition

Apply classifier at Scale / position range to search over

DetectionLecture 5

DetectionLecture 5

– Combine detection over space and scale.

DetectionLecture 5

Deep Learning in Object

Detection

SVM classifier

Lecture 5

Rich feature hierarchies for accurate object detection and semantic segmentation. R. Girshick et al.

Faster R-CNN

Faster R-CNN is a single,

unified network for object

detection. The RPN module

serves as the ‘attention’ of this

unified network.

Lecture 5

Transfer LearningLecture 6

J. Donahue et.al. DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition.

One Shot LearningLecture 6

G. Koch, R. Zemel, and R. Salakhutdinov Siamese Neural Networks for One-shot Image Recognition

Few-Shot Learning

• Prototypical Networks

• Maps examples to embedding such that examples of a given class

are close together

• Calculates a prototype (mean vector) for every class

• Maps test instances to the same embedding

• Uses softmax over distance to prototype for label prediction

test

Lecture 6

Prototypical Networks for Few-shot Learning (2017)Jake Snell, Kevin Swersky, Richard S. Zemel

Describing Objects with Attributes

• Shift the goal of recognition from naming

to describing

Discover/detect new

categories

Lecture 7

Describing Objects by Their Attributes, A. Farhadi, I. Endres, D. Hoiem, and D. Forsyth

Improvement

Lecture 7

Learning To Detect Unseen Object Classes by Between-Class Attribute Transfer, C. Lampert, H. Nickisch, and S. Harmeling

Deep Learning for Faces

Survey

Lecture 8

Image Descriptions (Captioning)

Uses CNNs and RNNs

Lecture 9

Deep Visual-Semantic Alignments for Generating Image Descriptions, A. Karpathy, Li Fei-Fei.

VQA: Visual Question

Answering

Lecture 9

VQA: Visual Question Answering. Antol et al.

ResultsLecture 9

Generative Adversarial

Networks (GAN)• GAN takes game-theoretic approach: learn to generate from training distribution

through 2-player game.

𝐺 𝑧tries to fool the

discriminator by

generating real-

looking images

𝐷 𝑥tries to

distinguish

between real and

fake images

Lecture 10

GAN: Convolutional Architecture

Radford et al, “Unsupervised Representation Learning with Deep Convolutional Generative Adversarial

Networks”, 2016

Lecture 10

Image Transfer

Lecture 10

CycleGAN

Discriminator 𝐷𝑌 pushes Generator 𝐺 to

translate X into outputs indistinguishable

from domain Y , and vice versa for 𝐷𝑋and F.

forward cycle-

consistency loss:

x → G(x) → F(G(x)) ≈ x

backward cycle-

consistency loss:

y → F(y) → G(F(y)) ≈ y

Lecture 10

Data Augmentation Generative

Adversarial Network (DAGAN):

• Generator: – an encoder maps the

input image into latent space

– a random vector is concatenated with the latent vector

– concatenated vector is passed to the decoder to generate an image

• Discriminator: similar to GAN

A. Antoniou, et al., “Data augmentation generative adversarial

Lecture 10

Lecture 10

Video Classification

• Using CNN – Naïve Approach

Lecture 11

Kapathy et al.: Large‐scale Video Classification with Convolutional Neural Networks


• Using CNN – Naïve Approach

Temporal fusion

Lecture 11

Kapathy et al.: Large‐scale Video Classification with Convolutional Neural Networks


• Modern Approaches

Lecture 11

K. Simonyan, A. Zissermann: Two‐Stream Convolutional Networks for Action Recognition in Videos.

Multi-task training

Training SetRandom

Batch

Lecture 12

Continual Learning

Training Set

at Time T1

Training Set at

Time T2

Training Set

at Time T3

…

Lecture 12

Continual Learning

• Tasks are learned sequentially over time.

• At time Ti, the data of the tasks T1…Ti-1 is no longer available.

• Leads to forgetting of previously learned tasks.

• Termed “Catastrophic Forgetting”.

Possible Solution:

• Measure the importance of learnable parameters in NN for the

task, constrain their change in learning a new task.

• This could work because deep networks are highly

overparametrized

James Kirkpatrick et al., “Overcoming catastrophic forgetting in neural networks”

Lecture 12

Adversarial Examples

Lecture 12

Explaining and Harnessing Adversarial Examples by Goodfellow et al.

Adversarial Examples: Imperceptible Noise

Intriguing properties of neural networks by Szegedy et al.

http://arxiv.org/abs/1312.6199

seminar rita osadchycs.haifa.ac.il/~rita/courses_moodle/object_recognition_seminar/intro.pdf ·...

Documents