high level computer vision recurrent neural …...“cat sat on mat” x2 “sat” h2 y2 x3...
TRANSCRIPT
High Level Computer Vision
Recurrent Neural Networks@ June 5, 2019
Bernt Schiele & Mario Fritz
www.mpi-inf.mpg.de/hlcv/ Max Planck Institute for Informatics & Saarland University, Saarland Informatics Campus Saarbrücken
High Level Computer Vision | Bernt Schiele & Mario Fritz
Overview Today’s Lecture
• Recurrent Neural Networks (RNNs) ‣ Motivation & flexibility of RNNs (some recap from last week)
‣ Language modeling - including “unreasonable effectiveness of RNNs”
‣ RNNs for image description / captioning
‣ Standard RNN and a particularly successful RNN: Long Short Term Memory (LSTM) - including “visualizations of RNN cells”
!2
Recurrent Networks offer a lot of flexibility:
slidecredit:AndrejKarpathy
Sequences in VisionSequences in the input… (many-to-one)
JumpingDancingFightingEating
Running
slidecredit:JeffDonahue
Sequences in VisionSequences in the output… (one-to-many)
A happy brown dog.
slidecredit:JeffDonahue
Sequences in VisionSequences everywhere! (many-to-many)
A dog jumps over a hurdle.
slidecredit:JeffDonahue
ConvNets
Krizhevsky et al., NIPS 2012
slidecredit:JeffDonahue
Problem #1
fixed-size, static input
224
224
slidecredit:JeffDonahue
Problem #1
fixed-size, static input
224
224
???
slidecredit:JeffDonahue
Problem #2
Krizhevsky et al., NIPS 2012
slidecredit:JeffDonahue
output is a single choice from a fixed list of options
doghorsefishsnake
cat
Problem #2slidecredit:JeffDonahue
output is a single choice from a fixed list of options
a happy brown doga big brown doga happy red doga big red dog
Problem #2
…
slidecredit:JeffDonahue
Recurrent Networks offer a lot of flexibility:
slidecredit:AndrejKarpathy
High Level Computer Vision | Bernt Schiele & Mario Fritz
Recurrent Neural Network (RNN)
!14
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele & Mario Fritz
Recurrent Neural Network (RNN)
!15
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele & Mario Fritz
Recurrent Neural Network (RNN)
!16
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele & Mario Fritz
Recurrent Neural Network (RNN)
!17
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele & Mario Fritz
(Simple) Recurrent Neural Network
!18
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele & Mario Fritz
RNN: Computational Graph
!19
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele & Mario Fritz
RNN: Computational Graph
!20
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele & Mario Fritz
RNN: Computational Graph
!21
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele & Mario Fritz
RNN: Computational Graph: Many to Many
!22
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele & Mario Fritz
RNN: Computational Graph: Many to Many
!23
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele & Mario Fritz !24
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele & Mario Fritz
RNN: Computational Graph: Many to One
!25
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele & Mario Fritz
RNN: Computational Graph: One to Many
!26
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele & Mario Fritz
Sequence to Sequence
!27
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele & Mario Fritz
Sequence to Sequence
!28
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
Recurrent Networks offer a lot of flexibility:
slidecredit:AndrejKarpathy
Language Models
Recurrent Neural Network Based Language Model [Tomas Mikolov, 2010]
Word-level language model. Similar to:
slidecredit:AndrejKarpathy
Suppose we had the training sentence “cat sat on mat”
We want to train a language model: P(next word | previous words)
i.e. want these to be high: P(cat | [<S>]) P(sat | [<S>, cat]) P(on | [<S>, cat, sat]) P(mat | [<S>, cat, sat, on]) P(<E>| [<S>, cat, sat, on, mat])
slidecredit:AndrejKarpathy
Suppose we had the training sentence “cat sat on mat”
We want to train a language model: P(next word | previous words)
First, suppose we had only a finite, 1-word history: i.e. want these to be high: P(cat | <S>) P(sat | cat) P(on | sat) P(mat | on) P(<E>| mat)
slidecredit:AndrejKarpathy
h0
x0 <START>
y0
x1 “cat”
h1
y1
“cat sat on mat”
x2 “sat”
h2
y2
x3 “on”
h3
y3
x4 “mat”
h4
y4
300 (learnable) numbers associated with each word in vocabulary
slidecredit:AndrejKarpathy
h0
x0 <START>
y0
x1 “cat”
h1
y1
“cat sat on mat”
x2 “sat”
h2
y2
x3 “on”
h3
y3
x4 “mat”
h4
y4
300 (learnable) numbers associated with each word in vocabulary
hidden layer (e.g. 500-D vectors) h4 = tanh(0, Wxh * x4)
slidecredit:AndrejKarpathy
h0
x0 <START>
y0
x1 “cat”
h1
y1
“cat sat on mat”
x2 “sat”
h2
y2
x3 “on”
h3
y3
x4 “mat”
h4
y4
300 (learnable) numbers associated with each word in vocabulary
10,001-D class scores: Softmax over 10,000 words and a special <END> token. y4 = Why * h4
hidden layer (e.g. 500-D vectors) h4 = tanh(0, Wxh * x4)
slidecredit:AndrejKarpathy
h0
x0 <START>
y0
x1 “cat”
h1
y1
“cat sat on mat”
x2 “sat”
h2
y2
x3 “on”
h3
y3
x4 “mat”
h4
y4
300 (learnable) numbers associated with each word in vocabulary
10,001-D class scores: Softmax over 10,000 words and a special <END> token. y4 = Why * h4
hidden layer (e.g. 500-D vectors) h4 = tanh(0, Wxh * x4 + Whh * h3)
Recurrent Neural Network:
slidecredit:AndrejKarpathy
Training this on a lot of sentences would give us a language model. A way to predict
P(next word | previous words)
x0 <START>
slidecredit:AndrejKarpathy
Generating Sentences...
Training this on a lot of sentences would give us a language model. A way to predict
P(next word | previous words) h0
x0 <START>
y0
slidecredit:AndrejKarpathy
Generating Sentences...
Training this on a lot of sentences would give us a language model. A way to predict
P(next word | previous words) h0
x0 <START>
y0
x1 “cat”
sample!
slidecredit:AndrejKarpathy
Generating Sentences...
Training this on a lot of sentences would give us a language model. A way to predict
P(next word | previous words) h0
x0 <START>
y0
x1 “cat”
h1
y1
slidecredit:AndrejKarpathy
Generating Sentences...
Training this on a lot of sentences would give us a language model. A way to predict
P(next word | previous words) h0
x0 <START>
y0
x1 “cat”
h1
y1
x2 “sat”
sample!
slidecredit:AndrejKarpathy
Generating Sentences...
Training this on a lot of sentences would give us a language model. A way to predict
P(next word | previous words) h0
x0 <START>
y0
x1 “cat”
h1
y1
x2 “sat”
h2
y2
slidecredit:AndrejKarpathy
Generating Sentences...
Training this on a lot of sentences would give us a language model. A way to predict
P(next word | previous words) h0
x0 <START>
y0
x1 “cat”
h1
y1
x2 “sat”
h2
y2
x3 “on”
sample!
slidecredit:AndrejKarpathy
Generating Sentences...
Training this on a lot of sentences would give us a language model. A way to predict
P(next word | previous words) h0
x0 <START>
y0
x1 “cat”
h1
y1
x2 “sat”
h2
y2
x3 “on”
h3
y3
slidecredit:AndrejKarpathy
Generating Sentences...
Training this on a lot of sentences would give us a language model. A way to predict
P(next word | previous words) h0
x0 <START>
y0
x1 “cat”
h1
y1
x2 “sat”
h2
y2
x3 “on”
h3
y3
x4 “mat”
sample!
slidecredit:AndrejKarpathy
Generating Sentences...
Training this on a lot of sentences would give us a language model. A way to predict
P(next word | previous words) h0
x0 <START>
y0
x1 “cat”
h1
y1
x2 “sat”
h2
y2
x3 “on”
h3
y3
x4 “mat”
h4
y4
slidecredit:AndrejKarpathy
Generating Sentences...
Training this on a lot of sentences would give us a language model. A way to predict
P(next word | previous words) h0
x0 <START>
y0
x1 “cat”
h1
y1
x2 “sat”
h2
y2
x3 “on”
h3
y3
x4 “mat”
h4
y4
samples <END>? done.
slidecredit:AndrejKarpathy
Generating Sentences...
High Level Computer Vision | Bernt Schiele & Mario Fritz
Example…
!48
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele & Mario Fritz
Example…
!49
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele & Mario Fritz
Example…
!50
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele & Mario Fritz
Example…
!51
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele & Mario Fritz
Example…
!52
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele & Mario Fritz
Example…
!53
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele & Mario Fritz
Example…
!54
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele & Mario Fritz
Learning via Backpropagation…
!55
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele & Mario Fritz
Learning via Backpropagation…
!56
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele & Mario Fritz
Learning via Backpropagation…
!57
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele & Mario Fritz
Learning via Backpropagation…
!58
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
“The Unreasonable Effectiveness of Recurrent Neural Networks”
karpathy.github.io
slidecredit:AndrejKarpathy
Character-level language model example
Vocabulary: [h,e,l,o]
Example training sequence: “hello”
slidecredit:AndrejKarpathy
slidecredit:AndrejKarpathy
slidecredit:AndrejKarpathy
slidecredit:AndrejKarpathy
slidecredit:AndrejKarpathy
Try it yourself: char-rnn on Github (uses Torch7)
slidecredit:AndrejKarpathy
Cooking Recipes
slidecredit:AndrejKarpathy
Obama Speeches
slidecredit:AndrejKarpathy
slidecredit:AndrejKarpathy
slidecredit:AndrejKarpathy
Learning from Linux Source Code
slidecredit:AndrejKarpathy
slidecredit:AndrejKarpathy
slidecredit:AndrejKarpathy
slidecredit:AndrejKarpathy
Yoav Goldberg n-gram experiments
Order 10 ngram model on Shakespeare:
slidecredit:AndrejKarpathy
But on Linux:
slidecredit:AndrejKarpathy
“straw hat”
training example
slidecredit:AndrejKarpathy
“straw hat”
training example
slidecredit:AndrejKarpathy
“straw hat”
training example
X
slidecredit:AndrejKarpathy
“straw hat”
training example
X
h0
x0 <START>
y0
x1 “straw”
h1
y1
x2 “hat”
h2
y2
<START> straw hat
slidecredit:AndrejKarpathy
“straw hat”
training example
X
h0
x0 <START>
y0
x1 “straw”
h1
y1
x2 “hat”
h2
y2
<START> straw hat
before: h0 = tanh(0, Wxh * x0)
now: h0 = tanh(0, Wxh * x0 + Wih * v)
slidecredit:AndrejKarpathy
test image
slidecredit:AndrejKarpathy
test image
x0 <START>
<START>
slidecredit:AndrejKarpathy
h0
x0 <START>
y0
<START>
test image
slidecredit:AndrejKarpathy
h0
x0 <START>
y0
<START>
test image
straw
sample!
slidecredit:AndrejKarpathy
h0
x0 <START>
y0
<START>
test image
straw
h1
y1
slidecredit:AndrejKarpathy
h0
x0 <START>
y0
<START>
test image
straw
h1
y1
hat
sample!
slidecredit:AndrejKarpathy
h0
x0 <START>
y0
<START>
test image
straw
h1
y1
hat
h2
y2
slidecredit:AndrejKarpathy
h0
x0 <START>
y0
<START>
test image
straw
h1
y1
hat
h2
y2
sample! <END> token => finish.
slidecredit:AndrejKarpathy
Wow I can’t believe that worked
slidecredit:JeffDonahue
Wow I can’t believe that worked
slidecredit:JeffDonahue
Well, I can kind of see it
slidecredit:JeffDonahue
Well, I can kind of see it
slidecredit:JeffDonahue
Not sure what happened there...
slidecredit:JeffDonahue
High Level Computer Vision | Bernt Schiele & Mario Fritz
Vanilla RNN...
!94
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele & Mario Fritz
Vanilla RNN...
!95
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele & Mario Fritz
Vanilla RNN...
!96
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele & Mario Fritz
Vanilla RNN...
!97
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele & Mario Fritz
Vanilla RNN...
!98
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele & Mario Fritz
Vanilla RNN...
!99
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele & Mario Fritz
Long Short Term Memory (LSTM)
!100
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele & Mario Fritz
Long Short Term Memory (LSTM)
!101
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele & Mario Fritz
Long Short Term Memory (LSTM)
!102
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele & Mario Fritz
Long Short Term Memory (LSTM): Gradient Flow
!103
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele & Mario Fritz
Long Short Term Memory (LSTM): Gradient Flow
!104
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele & Mario Fritz
Long Short Term Memory (LSTM): Gradient Flow
!105
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele & Mario Fritz
Long Short Term Memory (LSTM): Gradient Flow
!106
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele & Mario Fritz
Alternatives
!107
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele & Mario Fritz
Summary
!108
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
Visualizing and Understanding Recurrent Networks Andrej Karpathy*, Justin Johnson*, Li Fei-Fei (on arXiv.org)
slidecredit:AndrejKarpathy
Hunting interpretable cells
Hunting interpretable cells
slidecredit:AndrejKarpathy
Hunting interpretable cells
quote detection cell
slidecredit:AndrejKarpathy
Hunting interpretable cells
line length tracking cell
slidecredit:AndrejKarpathy
Hunting interpretable cells
if statement cell
slidecredit:AndrejKarpathy
Hunting interpretable cells
quote/comment cell
slidecredit:AndrejKarpathy
Hunting interpretable cells
code depth cell
slidecredit:AndrejKarpathy
Hunting interpretable cells
something interesting cell (not quite sure what)
slidecredit:AndrejKarpathy