object detection, deep learning, and r-cnnsshapiro/ee596/notes/deeplearning.pdf• convolutional...
TRANSCRIPT
![Page 1: Object detection, deep learning, and R-CNNsshapiro/EE596/notes/DeepLearning.pdf• Convolutional Neural Networks (CNNs) • overview and history • Region-based Convolutional Networks](https://reader035.vdocuments.site/reader035/viewer/2022070106/601f141e1401e05e457fcd75/html5/thumbnails/1.jpg)
Object detection, deep learning, and R-CNNs
Ross Girshick Microsoft Research
Guest lecture for UW CSE 455
Nov. 24, 2014
![Page 2: Object detection, deep learning, and R-CNNsshapiro/EE596/notes/DeepLearning.pdf• Convolutional Neural Networks (CNNs) • overview and history • Region-based Convolutional Networks](https://reader035.vdocuments.site/reader035/viewer/2022070106/601f141e1401e05e457fcd75/html5/thumbnails/2.jpg)
Outline
• Object detection • the task, evaluation, datasets
• Convolutional Neural Networks (CNNs) • overview and history
• Region-based Convolutional Networks (R-CNNs)
![Page 3: Object detection, deep learning, and R-CNNsshapiro/EE596/notes/DeepLearning.pdf• Convolutional Neural Networks (CNNs) • overview and history • Region-based Convolutional Networks](https://reader035.vdocuments.site/reader035/viewer/2022070106/601f141e1401e05e457fcd75/html5/thumbnails/3.jpg)
Image classification
• 𝐾 classes • Task: assign correct class label to the whole image
Digit classification (MNIST) Object recognition (Caltech-101)
![Page 4: Object detection, deep learning, and R-CNNsshapiro/EE596/notes/DeepLearning.pdf• Convolutional Neural Networks (CNNs) • overview and history • Region-based Convolutional Networks](https://reader035.vdocuments.site/reader035/viewer/2022070106/601f141e1401e05e457fcd75/html5/thumbnails/4.jpg)
Classification vs. Detection
Dog
Dog Dog
![Page 5: Object detection, deep learning, and R-CNNsshapiro/EE596/notes/DeepLearning.pdf• Convolutional Neural Networks (CNNs) • overview and history • Region-based Convolutional Networks](https://reader035.vdocuments.site/reader035/viewer/2022070106/601f141e1401e05e457fcd75/html5/thumbnails/5.jpg)
Problem formulation
person
motorbike
Input Desired output
{ airplane, bird, motorbike, person, sofa }
![Page 6: Object detection, deep learning, and R-CNNsshapiro/EE596/notes/DeepLearning.pdf• Convolutional Neural Networks (CNNs) • overview and history • Region-based Convolutional Networks](https://reader035.vdocuments.site/reader035/viewer/2022070106/601f141e1401e05e457fcd75/html5/thumbnails/6.jpg)
Evaluating a detector
Test image (previously unseen)
![Page 7: Object detection, deep learning, and R-CNNsshapiro/EE596/notes/DeepLearning.pdf• Convolutional Neural Networks (CNNs) • overview and history • Region-based Convolutional Networks](https://reader035.vdocuments.site/reader035/viewer/2022070106/601f141e1401e05e457fcd75/html5/thumbnails/7.jpg)
First detection ...
‘person’ detector predictions
0.9
![Page 8: Object detection, deep learning, and R-CNNsshapiro/EE596/notes/DeepLearning.pdf• Convolutional Neural Networks (CNNs) • overview and history • Region-based Convolutional Networks](https://reader035.vdocuments.site/reader035/viewer/2022070106/601f141e1401e05e457fcd75/html5/thumbnails/8.jpg)
Second detection ...
0.9
0.6
‘person’ detector predictions
![Page 9: Object detection, deep learning, and R-CNNsshapiro/EE596/notes/DeepLearning.pdf• Convolutional Neural Networks (CNNs) • overview and history • Region-based Convolutional Networks](https://reader035.vdocuments.site/reader035/viewer/2022070106/601f141e1401e05e457fcd75/html5/thumbnails/9.jpg)
Third detection ...
0.9
0.6
0.2
‘person’ detector predictions
![Page 10: Object detection, deep learning, and R-CNNsshapiro/EE596/notes/DeepLearning.pdf• Convolutional Neural Networks (CNNs) • overview and history • Region-based Convolutional Networks](https://reader035.vdocuments.site/reader035/viewer/2022070106/601f141e1401e05e457fcd75/html5/thumbnails/10.jpg)
Compare to ground truth
ground truth ‘person’ boxes
0.9
0.6
0.2
‘person’ detector predictions
![Page 11: Object detection, deep learning, and R-CNNsshapiro/EE596/notes/DeepLearning.pdf• Convolutional Neural Networks (CNNs) • overview and history • Region-based Convolutional Networks](https://reader035.vdocuments.site/reader035/viewer/2022070106/601f141e1401e05e457fcd75/html5/thumbnails/11.jpg)
Sort by confidence
... ... ... ... ...
✓ ✓ ✓
0.9 0.8 0.6 0.5 0.2 0.1
true positive (high overlap)
false positive (no overlap, low overlap, or duplicate)
X X X
![Page 12: Object detection, deep learning, and R-CNNsshapiro/EE596/notes/DeepLearning.pdf• Convolutional Neural Networks (CNNs) • overview and history • Region-based Convolutional Networks](https://reader035.vdocuments.site/reader035/viewer/2022070106/601f141e1401e05e457fcd75/html5/thumbnails/12.jpg)
Evaluation metric
... ... ... ... ...
0.9 0.8 0.6 0.5 0.2 0.1
✓ ✓ ✓ X X X
𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 =#𝑝𝑝𝑡𝑝 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝
#𝑝𝑝𝑡𝑝 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 + #𝑓𝑓𝑓𝑝𝑝 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝
𝑝𝑝𝑝𝑓𝑓𝑓𝑝𝑝 =#𝑝𝑝𝑡𝑝 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝
#𝑔𝑝𝑝𝑡𝑝𝑔 𝑝𝑝𝑡𝑝𝑡 𝑝𝑜𝑜𝑝𝑝𝑝𝑝
𝑝
✓ ✓ + X
![Page 13: Object detection, deep learning, and R-CNNsshapiro/EE596/notes/DeepLearning.pdf• Convolutional Neural Networks (CNNs) • overview and history • Region-based Convolutional Networks](https://reader035.vdocuments.site/reader035/viewer/2022070106/601f141e1401e05e457fcd75/html5/thumbnails/13.jpg)
Evaluation metric
Average Precision (AP) 0% is worst 100% is best mean AP over classes (mAP)
... ... ... ... ...
0.9 0.8 0.6 0.5 0.2 0.1
✓ ✓ ✓ X X X
![Page 14: Object detection, deep learning, and R-CNNsshapiro/EE596/notes/DeepLearning.pdf• Convolutional Neural Networks (CNNs) • overview and history • Region-based Convolutional Networks](https://reader035.vdocuments.site/reader035/viewer/2022070106/601f141e1401e05e457fcd75/html5/thumbnails/14.jpg)
Pedestrians Histograms of Oriented Gradients for Human Detection, Dalal and Triggs, CVPR 2005 AP ~77% More sophisticated methods: AP ~90%
(a) average gradient image over training examples (b) each “pixel” shows max positive SVM weight in the block centered on that pixel (c) same as (b) for negative SVM weights (d) test image (e) its R-HOG descriptor (f) R-HOG descriptor weighted by positive SVM weights (g) R-HOG descriptor weighted by negative SVM weights
![Page 15: Object detection, deep learning, and R-CNNsshapiro/EE596/notes/DeepLearning.pdf• Convolutional Neural Networks (CNNs) • overview and history • Region-based Convolutional Networks](https://reader035.vdocuments.site/reader035/viewer/2022070106/601f141e1401e05e457fcd75/html5/thumbnails/15.jpg)
Why did it work?
Average gradient image
![Page 16: Object detection, deep learning, and R-CNNsshapiro/EE596/notes/DeepLearning.pdf• Convolutional Neural Networks (CNNs) • overview and history • Region-based Convolutional Networks](https://reader035.vdocuments.site/reader035/viewer/2022070106/601f141e1401e05e457fcd75/html5/thumbnails/16.jpg)
Generic categories
Can we detect people, chairs, horses, cars, dogs, buses, bottles, sheep …? PASCAL Visual Object Categories (VOC) dataset
![Page 17: Object detection, deep learning, and R-CNNsshapiro/EE596/notes/DeepLearning.pdf• Convolutional Neural Networks (CNNs) • overview and history • Region-based Convolutional Networks](https://reader035.vdocuments.site/reader035/viewer/2022070106/601f141e1401e05e457fcd75/html5/thumbnails/17.jpg)
Generic categories Why doesn’t this work (as well)?
Can we detect people, chairs, horses, cars, dogs, buses, bottles, sheep …? PASCAL Visual Object Categories (VOC) dataset
![Page 18: Object detection, deep learning, and R-CNNsshapiro/EE596/notes/DeepLearning.pdf• Convolutional Neural Networks (CNNs) • overview and history • Region-based Convolutional Networks](https://reader035.vdocuments.site/reader035/viewer/2022070106/601f141e1401e05e457fcd75/html5/thumbnails/18.jpg)
Quiz time
![Page 19: Object detection, deep learning, and R-CNNsshapiro/EE596/notes/DeepLearning.pdf• Convolutional Neural Networks (CNNs) • overview and history • Region-based Convolutional Networks](https://reader035.vdocuments.site/reader035/viewer/2022070106/601f141e1401e05e457fcd75/html5/thumbnails/19.jpg)
Warm up
This is an average image of which object class?
![Page 20: Object detection, deep learning, and R-CNNsshapiro/EE596/notes/DeepLearning.pdf• Convolutional Neural Networks (CNNs) • overview and history • Region-based Convolutional Networks](https://reader035.vdocuments.site/reader035/viewer/2022070106/601f141e1401e05e457fcd75/html5/thumbnails/20.jpg)
Warm up
pedestrian
![Page 21: Object detection, deep learning, and R-CNNsshapiro/EE596/notes/DeepLearning.pdf• Convolutional Neural Networks (CNNs) • overview and history • Region-based Convolutional Networks](https://reader035.vdocuments.site/reader035/viewer/2022070106/601f141e1401e05e457fcd75/html5/thumbnails/21.jpg)
A little harder
?
![Page 22: Object detection, deep learning, and R-CNNsshapiro/EE596/notes/DeepLearning.pdf• Convolutional Neural Networks (CNNs) • overview and history • Region-based Convolutional Networks](https://reader035.vdocuments.site/reader035/viewer/2022070106/601f141e1401e05e457fcd75/html5/thumbnails/22.jpg)
A little harder
? Hint: airplane, bicycle, bus, car, cat, chair, cow, dog, dining table
![Page 23: Object detection, deep learning, and R-CNNsshapiro/EE596/notes/DeepLearning.pdf• Convolutional Neural Networks (CNNs) • overview and history • Region-based Convolutional Networks](https://reader035.vdocuments.site/reader035/viewer/2022070106/601f141e1401e05e457fcd75/html5/thumbnails/23.jpg)
A little harder
bicycle (PASCAL)
![Page 24: Object detection, deep learning, and R-CNNsshapiro/EE596/notes/DeepLearning.pdf• Convolutional Neural Networks (CNNs) • overview and history • Region-based Convolutional Networks](https://reader035.vdocuments.site/reader035/viewer/2022070106/601f141e1401e05e457fcd75/html5/thumbnails/24.jpg)
A little harder, yet
?
![Page 25: Object detection, deep learning, and R-CNNsshapiro/EE596/notes/DeepLearning.pdf• Convolutional Neural Networks (CNNs) • overview and history • Region-based Convolutional Networks](https://reader035.vdocuments.site/reader035/viewer/2022070106/601f141e1401e05e457fcd75/html5/thumbnails/25.jpg)
A little harder, yet
? Hint: white blob on a green background
![Page 26: Object detection, deep learning, and R-CNNsshapiro/EE596/notes/DeepLearning.pdf• Convolutional Neural Networks (CNNs) • overview and history • Region-based Convolutional Networks](https://reader035.vdocuments.site/reader035/viewer/2022070106/601f141e1401e05e457fcd75/html5/thumbnails/26.jpg)
A little harder, yet
sheep (PASCAL)
![Page 27: Object detection, deep learning, and R-CNNsshapiro/EE596/notes/DeepLearning.pdf• Convolutional Neural Networks (CNNs) • overview and history • Region-based Convolutional Networks](https://reader035.vdocuments.site/reader035/viewer/2022070106/601f141e1401e05e457fcd75/html5/thumbnails/27.jpg)
Impossible?
?
![Page 28: Object detection, deep learning, and R-CNNsshapiro/EE596/notes/DeepLearning.pdf• Convolutional Neural Networks (CNNs) • overview and history • Region-based Convolutional Networks](https://reader035.vdocuments.site/reader035/viewer/2022070106/601f141e1401e05e457fcd75/html5/thumbnails/28.jpg)
Impossible?
dog (PASCAL)
![Page 29: Object detection, deep learning, and R-CNNsshapiro/EE596/notes/DeepLearning.pdf• Convolutional Neural Networks (CNNs) • overview and history • Region-based Convolutional Networks](https://reader035.vdocuments.site/reader035/viewer/2022070106/601f141e1401e05e457fcd75/html5/thumbnails/29.jpg)
Impossible?
dog (PASCAL) Why does the mean look like this?
There’s no alignment between the examples! How do we combat this?
![Page 30: Object detection, deep learning, and R-CNNsshapiro/EE596/notes/DeepLearning.pdf• Convolutional Neural Networks (CNNs) • overview and history • Region-based Convolutional Networks](https://reader035.vdocuments.site/reader035/viewer/2022070106/601f141e1401e05e457fcd75/html5/thumbnails/30.jpg)
PASCAL VOC detection history
0%
10%
20%
30%
40%
50%
60%
70%
2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
mea
n Av
erag
e Pr
ecisi
on (m
AP)
year
DPM
DPM, HOG+ BOW
DPM, MKL
DPM++ DPM++, MKL, Selective Search
Selective Search, DPM++, MKL
41% 41% 37%
28% 23%
17%
![Page 31: Object detection, deep learning, and R-CNNsshapiro/EE596/notes/DeepLearning.pdf• Convolutional Neural Networks (CNNs) • overview and history • Region-based Convolutional Networks](https://reader035.vdocuments.site/reader035/viewer/2022070106/601f141e1401e05e457fcd75/html5/thumbnails/31.jpg)
Part-based models & multiple features (MKL)
0%
10%
20%
30%
40%
50%
60%
70%
2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
mea
n Av
erag
e Pr
ecisi
on (m
AP)
year
DPM
DPM, HOG+ BOW
DPM, MKL
DPM++ DPM++, MKL, Selective Search
Selective Search, DPM++, MKL
41% 41% 37%
28% 23%
17%
![Page 32: Object detection, deep learning, and R-CNNsshapiro/EE596/notes/DeepLearning.pdf• Convolutional Neural Networks (CNNs) • overview and history • Region-based Convolutional Networks](https://reader035.vdocuments.site/reader035/viewer/2022070106/601f141e1401e05e457fcd75/html5/thumbnails/32.jpg)
Kitchen-sink approaches
0%
10%
20%
30%
40%
50%
60%
70%
2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
mea
n Av
erag
e Pr
ecisi
on (m
AP)
year
DPM
DPM, HOG+ BOW
DPM, MKL
DPM++ DPM++, MKL, Selective Search
Selective Search, DPM++, MKL
41% 41% 37%
28% 23%
17%
increasing complexity & plateau
![Page 33: Object detection, deep learning, and R-CNNsshapiro/EE596/notes/DeepLearning.pdf• Convolutional Neural Networks (CNNs) • overview and history • Region-based Convolutional Networks](https://reader035.vdocuments.site/reader035/viewer/2022070106/601f141e1401e05e457fcd75/html5/thumbnails/33.jpg)
Region-based Convolutional Networks (R-CNNs)
0%
10%
20%
30%
40%
50%
60%
70%
2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
mea
n Av
erag
e Pr
ecisi
on (m
AP)
year
DPM
DPM, HOG+ BOW
DPM, MKL
DPM++ DPM++, MKL, Selective Search
Selective Search, DPM++, MKL
41% 41% 37%
28% 23%
17%
53%
62%
R-CNN v1
R-CNN v2
[R-CNN. Girshick et al. CVPR 2014]
![Page 34: Object detection, deep learning, and R-CNNsshapiro/EE596/notes/DeepLearning.pdf• Convolutional Neural Networks (CNNs) • overview and history • Region-based Convolutional Networks](https://reader035.vdocuments.site/reader035/viewer/2022070106/601f141e1401e05e457fcd75/html5/thumbnails/34.jpg)
0%
10%
20%
30%
40%
50%
60%
70%
2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
mea
n Av
erag
e Pr
ecisi
on (m
AP)
year
~1 year
~5 years
Region-based Convolutional Networks (R-CNNs)
[R-CNN. Girshick et al. CVPR 2014]
![Page 35: Object detection, deep learning, and R-CNNsshapiro/EE596/notes/DeepLearning.pdf• Convolutional Neural Networks (CNNs) • overview and history • Region-based Convolutional Networks](https://reader035.vdocuments.site/reader035/viewer/2022070106/601f141e1401e05e457fcd75/html5/thumbnails/35.jpg)
Convolutional Neural Networks
• Overview
![Page 36: Object detection, deep learning, and R-CNNsshapiro/EE596/notes/DeepLearning.pdf• Convolutional Neural Networks (CNNs) • overview and history • Region-based Convolutional Networks](https://reader035.vdocuments.site/reader035/viewer/2022070106/601f141e1401e05e457fcd75/html5/thumbnails/36.jpg)
Standard Neural Networks
𝒙 = 𝑥1, … , 𝑥784 𝑇 𝑧𝑗 = 𝑔(𝒘𝑗 𝑇𝒙) 𝑔 𝑝 =
11 + 𝑝−𝑡
“Fully connected”
![Page 37: Object detection, deep learning, and R-CNNsshapiro/EE596/notes/DeepLearning.pdf• Convolutional Neural Networks (CNNs) • overview and history • Region-based Convolutional Networks](https://reader035.vdocuments.site/reader035/viewer/2022070106/601f141e1401e05e457fcd75/html5/thumbnails/37.jpg)
From NNs to Convolutional NNs
• Local connectivity • Shared (“tied”) weights • Multiple feature maps • Pooling
![Page 38: Object detection, deep learning, and R-CNNsshapiro/EE596/notes/DeepLearning.pdf• Convolutional Neural Networks (CNNs) • overview and history • Region-based Convolutional Networks](https://reader035.vdocuments.site/reader035/viewer/2022070106/601f141e1401e05e457fcd75/html5/thumbnails/38.jpg)
Convolutional NNs
• Local connectivity
• Each green unit is only connected to (3) neighboring blue units
compare
![Page 39: Object detection, deep learning, and R-CNNsshapiro/EE596/notes/DeepLearning.pdf• Convolutional Neural Networks (CNNs) • overview and history • Region-based Convolutional Networks](https://reader035.vdocuments.site/reader035/viewer/2022070106/601f141e1401e05e457fcd75/html5/thumbnails/39.jpg)
Convolutional NNs
• Shared (“tied”) weights
• All green units share the same parameters 𝒘
• Each green unit computes the same function, but with a different input window
𝑤1
𝑤2 𝑤3
𝑤1
𝑤2 𝑤3
![Page 40: Object detection, deep learning, and R-CNNsshapiro/EE596/notes/DeepLearning.pdf• Convolutional Neural Networks (CNNs) • overview and history • Region-based Convolutional Networks](https://reader035.vdocuments.site/reader035/viewer/2022070106/601f141e1401e05e457fcd75/html5/thumbnails/40.jpg)
Convolutional NNs
• Convolution with 1-D filter: [𝑤3,𝑤2,𝑤1]
• All green units share the same parameters 𝒘
• Each green unit computes the same function, but with a different input window
𝑤1
𝑤2 𝑤3
![Page 41: Object detection, deep learning, and R-CNNsshapiro/EE596/notes/DeepLearning.pdf• Convolutional Neural Networks (CNNs) • overview and history • Region-based Convolutional Networks](https://reader035.vdocuments.site/reader035/viewer/2022070106/601f141e1401e05e457fcd75/html5/thumbnails/41.jpg)
Convolutional NNs
• Convolution with 1-D filter: [𝑤3,𝑤2,𝑤1]
• All green units share the same parameters 𝒘
• Each green unit computes the same function, but with a different input window
𝑤1
𝑤2 𝑤3
![Page 42: Object detection, deep learning, and R-CNNsshapiro/EE596/notes/DeepLearning.pdf• Convolutional Neural Networks (CNNs) • overview and history • Region-based Convolutional Networks](https://reader035.vdocuments.site/reader035/viewer/2022070106/601f141e1401e05e457fcd75/html5/thumbnails/42.jpg)
Convolutional NNs
• Convolution with 1-D filter: [𝑤3,𝑤2,𝑤1]
• All green units share the same parameters 𝒘
• Each green unit computes the same function, but with a different input window
𝑤1
𝑤2 𝑤3
![Page 43: Object detection, deep learning, and R-CNNsshapiro/EE596/notes/DeepLearning.pdf• Convolutional Neural Networks (CNNs) • overview and history • Region-based Convolutional Networks](https://reader035.vdocuments.site/reader035/viewer/2022070106/601f141e1401e05e457fcd75/html5/thumbnails/43.jpg)
Convolutional NNs
• Convolution with 1-D filter: [𝑤3,𝑤2,𝑤1]
• All green units share the same parameters 𝒘
• Each green unit computes the same function, but with a different input window
𝑤1
𝑤2 𝑤3
![Page 44: Object detection, deep learning, and R-CNNsshapiro/EE596/notes/DeepLearning.pdf• Convolutional Neural Networks (CNNs) • overview and history • Region-based Convolutional Networks](https://reader035.vdocuments.site/reader035/viewer/2022070106/601f141e1401e05e457fcd75/html5/thumbnails/44.jpg)
Convolutional NNs
• Convolution with 1-D filter: [𝑤3,𝑤2,𝑤1]
• All green units share the same parameters 𝒘
• Each green unit computes the same function, but with a different input window 𝑤1
𝑤2 𝑤3
![Page 45: Object detection, deep learning, and R-CNNsshapiro/EE596/notes/DeepLearning.pdf• Convolutional Neural Networks (CNNs) • overview and history • Region-based Convolutional Networks](https://reader035.vdocuments.site/reader035/viewer/2022070106/601f141e1401e05e457fcd75/html5/thumbnails/45.jpg)
Convolutional NNs
• Multiple feature maps
• All orange units compute the same function but with a different input windows
• Orange and green units compute different functions
𝑤1
𝑤2 𝑤3
𝑤𝑤1 𝑤𝑤2 𝑤𝑤3
Feature map 1 (array of green units)
Feature map 2 (array of orange units)
![Page 46: Object detection, deep learning, and R-CNNsshapiro/EE596/notes/DeepLearning.pdf• Convolutional Neural Networks (CNNs) • overview and history • Region-based Convolutional Networks](https://reader035.vdocuments.site/reader035/viewer/2022070106/601f141e1401e05e457fcd75/html5/thumbnails/46.jpg)
Convolutional NNs
• Pooling (max, average)
1
4
0
3
4
3
• Pooling area: 2 units
• Pooling stride: 2 units
• Subsamples feature maps
![Page 47: Object detection, deep learning, and R-CNNsshapiro/EE596/notes/DeepLearning.pdf• Convolutional Neural Networks (CNNs) • overview and history • Region-based Convolutional Networks](https://reader035.vdocuments.site/reader035/viewer/2022070106/601f141e1401e05e457fcd75/html5/thumbnails/47.jpg)
Image
Pooling
Convolution
2D input
![Page 48: Object detection, deep learning, and R-CNNsshapiro/EE596/notes/DeepLearning.pdf• Convolutional Neural Networks (CNNs) • overview and history • Region-based Convolutional Networks](https://reader035.vdocuments.site/reader035/viewer/2022070106/601f141e1401e05e457fcd75/html5/thumbnails/48.jpg)
1989
Backpropagation applied to handwritten zip code recognition, Lecun et al., 1989
![Page 49: Object detection, deep learning, and R-CNNsshapiro/EE596/notes/DeepLearning.pdf• Convolutional Neural Networks (CNNs) • overview and history • Region-based Convolutional Networks](https://reader035.vdocuments.site/reader035/viewer/2022070106/601f141e1401e05e457fcd75/html5/thumbnails/49.jpg)
Historical perspective – 1980
![Page 50: Object detection, deep learning, and R-CNNsshapiro/EE596/notes/DeepLearning.pdf• Convolutional Neural Networks (CNNs) • overview and history • Region-based Convolutional Networks](https://reader035.vdocuments.site/reader035/viewer/2022070106/601f141e1401e05e457fcd75/html5/thumbnails/50.jpg)
Historical perspective – 1980 Hubel and Wiesel 1962
Included basic ingredients of ConvNets, but no supervised learning algorithm
![Page 51: Object detection, deep learning, and R-CNNsshapiro/EE596/notes/DeepLearning.pdf• Convolutional Neural Networks (CNNs) • overview and history • Region-based Convolutional Networks](https://reader035.vdocuments.site/reader035/viewer/2022070106/601f141e1401e05e457fcd75/html5/thumbnails/51.jpg)
Supervised learning – 1986
Early demonstration that error backpropagation can be used for supervised training of neural nets (including ConvNets)
Gradient descent training with error backpropagation
![Page 52: Object detection, deep learning, and R-CNNsshapiro/EE596/notes/DeepLearning.pdf• Convolutional Neural Networks (CNNs) • overview and history • Region-based Convolutional Networks](https://reader035.vdocuments.site/reader035/viewer/2022070106/601f141e1401e05e457fcd75/html5/thumbnails/52.jpg)
Supervised learning – 1986
“T” vs. “C” problem Simple ConvNet
![Page 53: Object detection, deep learning, and R-CNNsshapiro/EE596/notes/DeepLearning.pdf• Convolutional Neural Networks (CNNs) • overview and history • Region-based Convolutional Networks](https://reader035.vdocuments.site/reader035/viewer/2022070106/601f141e1401e05e457fcd75/html5/thumbnails/53.jpg)
Practical ConvNets
Gradient-Based Learning Applied to Document Recognition, Lecun et al., 1998
![Page 54: Object detection, deep learning, and R-CNNsshapiro/EE596/notes/DeepLearning.pdf• Convolutional Neural Networks (CNNs) • overview and history • Region-based Convolutional Networks](https://reader035.vdocuments.site/reader035/viewer/2022070106/601f141e1401e05e457fcd75/html5/thumbnails/54.jpg)
Demo
• http://cs.stanford.edu/people/karpathy/convnetjs/demo/mnist.html
• ConvNetJS by Andrej Karpathy (Ph.D. student at Stanford)
Software libraries • Caffe (C++, python, matlab) • Torch7 (C++, lua) • Theano (python)
![Page 55: Object detection, deep learning, and R-CNNsshapiro/EE596/notes/DeepLearning.pdf• Convolutional Neural Networks (CNNs) • overview and history • Region-based Convolutional Networks](https://reader035.vdocuments.site/reader035/viewer/2022070106/601f141e1401e05e457fcd75/html5/thumbnails/55.jpg)
The fall of ConvNets
• The rise of Support Vector Machines (SVMs) • Mathematical advantages (theory, convex
optimization) • Competitive performance on tasks such as digit
classification • Neural nets became unpopular in the mid 1990s
![Page 56: Object detection, deep learning, and R-CNNsshapiro/EE596/notes/DeepLearning.pdf• Convolutional Neural Networks (CNNs) • overview and history • Region-based Convolutional Networks](https://reader035.vdocuments.site/reader035/viewer/2022070106/601f141e1401e05e457fcd75/html5/thumbnails/56.jpg)
The key to SVMs
• It’s all about the features
Histograms of Oriented Gradients for Human Detection, Dalal and Triggs, CVPR 2005
SVM weights (+) (-)
HOG features
![Page 57: Object detection, deep learning, and R-CNNsshapiro/EE596/notes/DeepLearning.pdf• Convolutional Neural Networks (CNNs) • overview and history • Region-based Convolutional Networks](https://reader035.vdocuments.site/reader035/viewer/2022070106/601f141e1401e05e457fcd75/html5/thumbnails/57.jpg)
Core idea of “deep learning”
• Input: the “raw” signal (image, waveform, …)
• Features: hierarchy of features is learned from the raw input
![Page 58: Object detection, deep learning, and R-CNNsshapiro/EE596/notes/DeepLearning.pdf• Convolutional Neural Networks (CNNs) • overview and history • Region-based Convolutional Networks](https://reader035.vdocuments.site/reader035/viewer/2022070106/601f141e1401e05e457fcd75/html5/thumbnails/58.jpg)
• If SVMs killed neural nets, how did they come back (in computer vision)?
![Page 59: Object detection, deep learning, and R-CNNsshapiro/EE596/notes/DeepLearning.pdf• Convolutional Neural Networks (CNNs) • overview and history • Region-based Convolutional Networks](https://reader035.vdocuments.site/reader035/viewer/2022070106/601f141e1401e05e457fcd75/html5/thumbnails/59.jpg)
What’s new since the 1980s?
• More layers • LeNet-3 and LeNet-5 had 3 and 5 learnable layers • Current models have 8 – 20+
• “ReLU” non-linearities (Rectified Linear Unit) • 𝑔 𝑥 = max 0, 𝑥 • Gradient doesn’t vanish
• “Dropout” regularization • Fast GPU implementations • More data
𝑥
𝑔(𝑥)
![Page 60: Object detection, deep learning, and R-CNNsshapiro/EE596/notes/DeepLearning.pdf• Convolutional Neural Networks (CNNs) • overview and history • Region-based Convolutional Networks](https://reader035.vdocuments.site/reader035/viewer/2022070106/601f141e1401e05e457fcd75/html5/thumbnails/60.jpg)
Ross’s Own System: Region CNNs
![Page 61: Object detection, deep learning, and R-CNNsshapiro/EE596/notes/DeepLearning.pdf• Convolutional Neural Networks (CNNs) • overview and history • Region-based Convolutional Networks](https://reader035.vdocuments.site/reader035/viewer/2022070106/601f141e1401e05e457fcd75/html5/thumbnails/61.jpg)
Competitive Results
![Page 62: Object detection, deep learning, and R-CNNsshapiro/EE596/notes/DeepLearning.pdf• Convolutional Neural Networks (CNNs) • overview and history • Region-based Convolutional Networks](https://reader035.vdocuments.site/reader035/viewer/2022070106/601f141e1401e05e457fcd75/html5/thumbnails/62.jpg)
Top Regions for Six Object Classes