Download - Thesis Presentation
![Page 1: Thesis Presentation](https://reader030.vdocuments.site/reader030/viewer/2022032714/55aae85a1a28abe6778b4588/html5/thumbnails/1.jpg)
A Deep Belief Network
Approach to Learning Depth
from Optical Flow
Reuben Feinman1
Applied Mathematics Honors Thesis
by
![Page 2: Thesis Presentation](https://reader030.vdocuments.site/reader030/viewer/2022032714/55aae85a1a28abe6778b4588/html5/thumbnails/2.jpg)
Background
2
•Visual system of insects are exquisitely sensitive to motion
•Srinivasan et al 1989 showed that bees decipher the range of their targets by absolute motion and motion relative to the background
•Key idea: optical flow is important to navigation
![Page 3: Thesis Presentation](https://reader030.vdocuments.site/reader030/viewer/2022032714/55aae85a1a28abe6778b4588/html5/thumbnails/3.jpg)
Motion Parallax in the Dorsal Stream
Humans perceive depth rather precisely via motion parallax
• Motion is a powerful monocular cue to depth understanding
• Assists with interpretation of spatial relationships
• “Optical flow”: the motion information encoded in the visual system
3
source: opticflow.bu.edu
![Page 4: Thesis Presentation](https://reader030.vdocuments.site/reader030/viewer/2022032714/55aae85a1a28abe6778b4588/html5/thumbnails/4.jpg)
Deep Learning
4
•The mapping from motion to depth is highly nonlinear (Braunstein, 1976)•Great progress in deep learning; multiple layers of nonlinear processing, more complex input to output function
source: www.deeplearning.stanford.edu
Motion Information
Depth prediction
->->->->
-->
![Page 5: Thesis Presentation](https://reader030.vdocuments.site/reader030/viewer/2022032714/55aae85a1a28abe6778b4588/html5/thumbnails/5.jpg)
Computer Graphics•Need labeled training data; videos do not have ground truth depth
•Graphical scenes generated by a gaming engine provide large number of training samples for supervised learning
5
A scene excerpt from our CryEngine forest database
RGB frame
ground truth depth map
![Page 6: Thesis Presentation](https://reader030.vdocuments.site/reader030/viewer/2022032714/55aae85a1a28abe6778b4588/html5/thumbnails/6.jpg)
6
MT Motion Model • Hierarchical model of motion processing; alternate between template
matching and max pooling
• Convolutional learning of spatio-temporal features
• Extension of HMAX (Serre et al 2007)
Jhuang et al 2007
![Page 7: Thesis Presentation](https://reader030.vdocuments.site/reader030/viewer/2022032714/55aae85a1a28abe6778b4588/html5/thumbnails/7.jpg)
Population Responses
7
Dorsal velocity model outputs a motion energy feature map
•(# Speeds) x (# Directions) x Height x Width •In other words: Each pixel contains a feature vector X with (# Speeds) x (# Directions) dimensions
![Page 8: Thesis Presentation](https://reader030.vdocuments.site/reader030/viewer/2022032714/55aae85a1a28abe6778b4588/html5/thumbnails/8.jpg)
8
Deep Belief Networks
•MLP: fail•Lots of unlabeled data available; maybe we can exploit this data and extract deep hierarchical representations of our motion model outputs•Initialize network with feature detectors
source: http://deeplearning.net
![Page 9: Thesis Presentation](https://reader030.vdocuments.site/reader030/viewer/2022032714/55aae85a1a28abe6778b4588/html5/thumbnails/9.jpg)
The RBM Model
9
Maximum likelihood learning: update model parameters to maximize the likelihood of our training data
Standard RBM:
Gaussian-Bernoulli RBM:
P(v,h) = (1/Z)*exp(-E(v,h))
We then create a new “free energy” version which sums over all possible hidden states
P(v) = (1/Z)*exp(-F(v))
source: http://deeplearning.net
![Page 10: Thesis Presentation](https://reader030.vdocuments.site/reader030/viewer/2022032714/55aae85a1a28abe6778b4588/html5/thumbnails/10.jpg)
Justifying Greedy Layer-Wise Pre-Training
10
•We use a Markov chain with alternating Gibbs Samplingh’ ~ P(h | v = v)v’ ~ P(v | h = h’)
•Gibbs Sampling is guaranteed to reduce the KL divergence between the posterior distribution in a given layer and the model’s equilibrium distribution
Hinton et al 2006
![Page 11: Thesis Presentation](https://reader030.vdocuments.site/reader030/viewer/2022032714/55aae85a1a28abe6778b4588/html5/thumbnails/11.jpg)
The DBN
11
• The data: feature vectors have 72 elements, tuned to 9 different speeds and 8 directions (9*8 = 72)• DBN takes in 3x3 pixel window• 3 Hidden layers of 800 units; sigmoidal activation• Linear output layer
Technicalities:•Mini-batch training with batch size of 5000•Sparse initialization scheme•RMSprop learning rule (regularized mean squares)•Backpropagation fine-tuning with dropout, dropping 20% of units at each layer except for the input layer•Geometrically decaying learning rate (LR = 0.998*LR at each epoch)
![Page 12: Thesis Presentation](https://reader030.vdocuments.site/reader030/viewer/2022032714/55aae85a1a28abe6778b4588/html5/thumbnails/12.jpg)
Results
12
DBN Linear Regression Ground Truth
test set R2: 0.445 test set R2: 0.240
![Page 13: Thesis Presentation](https://reader030.vdocuments.site/reader030/viewer/2022032714/55aae85a1a28abe6778b4588/html5/thumbnails/13.jpg)
13
MLP (sparse initialization)
single-pixel linear
regression
3x3 window linear
regression
single-pixel DBN
3x3 window DBN
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0 1 2 3 4 5 6
R^2
Sco
reR^2 Score per Model
![Page 14: Thesis Presentation](https://reader030.vdocuments.site/reader030/viewer/2022032714/55aae85a1a28abe6778b4588/html5/thumbnails/14.jpg)
Markov Random Field SmoothingReceptive field can be a powerful tool for decoding
14
MRF defined by two potential functions:1) Φ = ∑_i [ (w • x_i − d_i) ^ 2 ]2) Ψ = ∑_<i,j> [ (d_i − d_j)^2 /( (d_i − d_j)^2 + 1) ) ]
(note: <i,j> = all neighboring pairs i,j)
P(d | x ; alpha, w) = (1/Z) * exp(− (alpha*Ψ + Φ)).Peter Orchard, University of Edinburgh
ground truth original prediction: 0.595 MRF prediction: 0.630
![Page 15: Thesis Presentation](https://reader030.vdocuments.site/reader030/viewer/2022032714/55aae85a1a28abe6778b4588/html5/thumbnails/15.jpg)
Drone Test
15
![Page 16: Thesis Presentation](https://reader030.vdocuments.site/reader030/viewer/2022032714/55aae85a1a28abe6778b4588/html5/thumbnails/16.jpg)
16
![Page 17: Thesis Presentation](https://reader030.vdocuments.site/reader030/viewer/2022032714/55aae85a1a28abe6778b4588/html5/thumbnails/17.jpg)
Future Work
• Increase pre-training dataset
• Real video labeled data with XBOX Kinect
• Down-sample motion features and ground truth
17
![Page 18: Thesis Presentation](https://reader030.vdocuments.site/reader030/viewer/2022032714/55aae85a1a28abe6778b4588/html5/thumbnails/18.jpg)
Thanks!
• Thomas Serre
• Stuart Geman
• David Mely
• Youssef Barhomi
18
Questions?
![Page 19: Thesis Presentation](https://reader030.vdocuments.site/reader030/viewer/2022032714/55aae85a1a28abe6778b4588/html5/thumbnails/19.jpg)
Normalizing the Data• Training a GB-RBM is hard; the distributions of spike firing rates have many
variations depending on the dataset
• We propose a normalized GB-RBM where the training data is normalized to zero mean and unit variance; all datasets thereafter (validation & test) are normalized with the same parameters
19
Dataset histograms before and after normalization