seminar rita osadchycs.haifa.ac.il/~rita/courses_moodle/object_recognition_seminar/intro.pdf ·...
TRANSCRIPT
Object Recognition
Seminar
Rita Osadchy
So what does object recognition involve?
Verification: is that a bus?
Detection: locate the cars in the image
Identification: is that a picture of Mao?
Object categorization
sky
building
flag
wallbanner
bus
cars
bus
face
street lamp
Challenges 1: view point variation
Michelangelo 1475-1564
Challenges 2: illumination
slide credit: S. Ullman
Challenges 3: occlusion
Magritte, 1957
Challenges 4: scale
Challenges 5: deformation
Xu, Beihong 1943
Challenges 7: intra-class variation
Recognition Steps
Features Image
representation
Train Images
Preprocessing
Features Image
representation
Test Images
Recognition
classifier
labels
classifier output
Object Recognition System
conveyer belt
sorting
chamber
salmon
sea bass
camera
classifier
How to design a PR system? Collect data and classify by hand
salmon salmon salmonsea bass sea bass sea bass
Preprocess by segmenting fish from background
Extract possibly discriminating features length, lightness,width,number of fins,etc.
Classifier design
Choose model
Train classifier on part of collected data (training data)
Test classifier on the rest of collected data (test data) i.e. the data not used for training Should classify new data (new fish images) well
Interest Point Detectors
• Basic requirements:
– Sparse
– Informative
– Repeatable
• Invariance
– Rotation
– Scale (Similarity)
– Affine
Lecture 3
17
Learned models of local features, and got object outline from
background subtraction:
Objects may then be found under occlusion and 3D rotation
Images fro
m: D
avid
Low
e,
Ob
ject
Reco
gn
itio
n f
rom
Lo
cal S
cale
-In
vari
an
t F
eatu
res P
roc. of
the Inte
rnational C
onfe
rence o
n C
om
pute
r V
isio
n,C
orf
u (
Sept. 1
999)
Lecture 3
Recognizing Specific Objects
Bag of Features
=
Lecture 3
Pros: fairly flexible and computationally efficient
Cons: problems with large clutter
Different objects, but Similar
representations;Similar objects, different
representations;
Bag of FeaturesLecture 3
• Computing bags of features on sub-
windows of the whole image.
level 2
Lazebnik, Schmid & Ponce (CVPR 2006)
Beyond Bags of FeaturesLecture 3
Convolutional Neural Networks
• Learn all in one deep architecture:
– low level features
– high level representations
– context
– classifiers
• Efficient Classification
• Efficient Detection
• Scalable to very large sets and large number of
categories
Lecture 4
Krizhevsky, I. Sutskever, and G. Hinton. ImageNet classification with deep convolutional neural networks
Convolutional Neural Networks
Lecture 4
Training the Network
p1=0.03
p0=0.01
pN-1=0.02
pN=0.015
cat
banana
mug
dog
0
1
0
0
𝑋 𝑌
Compute the gradient with respect to the parameters of the
network.
Lecture 4
Very Deep NetworksLecture 4
Very Deep NetworksLecture 4
• K. Simonyan and A. Zisserman Very Deep Convolutional Networks for Large-Scale Image Recognition
• K. He, X. Zhang, S. Ren, and J. Sun: Deep Residual Learning for Image Recognition
Apply classifier at Scale / position range to search over
DetectionLecture 5
DetectionLecture 5
DetectionLecture 5
– Combine detection over space and scale.
DetectionLecture 5
Deep Learning in Object
Detection
SVM classifier
Lecture 5
Rich feature hierarchies for accurate object detection and semantic segmentation. R. Girshick et al.
Faster R-CNN
Faster R-CNN is a single,
unified network for object
detection. The RPN module
serves as the ‘attention’ of this
unified network.
Lecture 5
Transfer LearningLecture 6
J. Donahue et.al. DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition.
One Shot LearningLecture 6
G. Koch, R. Zemel, and R. Salakhutdinov Siamese Neural Networks for One-shot Image Recognition
Few-Shot Learning
• Prototypical Networks
• Maps examples to embedding such that examples of a given class
are close together
• Calculates a prototype (mean vector) for every class
• Maps test instances to the same embedding
• Uses softmax over distance to prototype for label prediction
test
Lecture 6
Prototypical Networks for Few-shot Learning (2017)Jake Snell, Kevin Swersky, Richard S. Zemel
Describing Objects with Attributes
• Shift the goal of recognition from naming
to describing
Discover/detect new
categories
Lecture 7
Describing Objects by Their Attributes, A. Farhadi, I. Endres, D. Hoiem, and D. Forsyth
Improvement
Lecture 7
Learning To Detect Unseen Object Classes by Between-Class Attribute Transfer, C. Lampert, H. Nickisch, and S. Harmeling
Deep Learning for Faces
Survey
Lecture 8
Image Descriptions (Captioning)
Uses CNNs and RNNs
Lecture 9
Deep Visual-Semantic Alignments for Generating Image Descriptions, A. Karpathy, Li Fei-Fei.
VQA: Visual Question
Answering
Lecture 9
VQA: Visual Question Answering. Antol et al.
ResultsLecture 9
Generative Adversarial
Networks (GAN)• GAN takes game-theoretic approach: learn to generate from training distribution
through 2-player game.
𝐺 𝑧tries to fool the
discriminator by
generating real-
looking images
𝐷 𝑥tries to
distinguish
between real and
fake images
Lecture 10
GAN: Convolutional Architecture
Radford et al, “Unsupervised Representation Learning with Deep Convolutional Generative Adversarial
Networks”, 2016
Lecture 10
Image Transfer
Lecture 10
CycleGAN
Discriminator 𝐷𝑌 pushes Generator 𝐺 to
translate X into outputs indistinguishable
from domain Y , and vice versa for 𝐷𝑋and F.
forward cycle-
consistency loss:
x → G(x) → F(G(x)) ≈ x
backward cycle-
consistency loss:
y → F(y) → G(F(y)) ≈ y
Lecture 10
Data Augmentation Generative
Adversarial Network (DAGAN):
• Generator: – an encoder maps the
input image into latent space
– a random vector is concatenated with the latent vector
– concatenated vector is passed to the decoder to generate an image
• Discriminator: similar to GAN
A. Antoniou, et al., “Data augmentation generative adversarial
Lecture 10
Lecture 10
Video Classification
• Using CNN – Naïve Approach
Lecture 11
Kapathy et al.: Large‐scale Video Classification with Convolutional Neural Networks
Video Classification
• Using CNN – Naïve Approach
Temporal fusion
Lecture 11
Kapathy et al.: Large‐scale Video Classification with Convolutional Neural Networks
Video Classification
• Modern Approaches
Lecture 11
K. Simonyan, A. Zissermann: Two‐Stream Convolutional Networks for Action Recognition in Videos.
Multi-task training
Training SetRandom
Batch
Lecture 12
Continual Learning
Training Set
at Time T1
Training Set at
Time T2
Training Set
at Time T3
…
Lecture 12
Continual Learning
• Tasks are learned sequentially over time.
• At time Ti, the data of the tasks T1…Ti-1 is no longer available.
• Leads to forgetting of previously learned tasks.
• Termed “Catastrophic Forgetting”.
Possible Solution:
• Measure the importance of learnable parameters in NN for the
task, constrain their change in learning a new task.
• This could work because deep networks are highly
overparametrized
James Kirkpatrick et al., “Overcoming catastrophic forgetting in neural networks”
Lecture 12
Adversarial Examples
Lecture 12
Explaining and Harnessing Adversarial Examples by Goodfellow et al.
Adversarial Examples: Imperceptible Noise
Intriguing properties of neural networks by Szegedy et al.