high level computer vision recurrent neural …...“cat sat on mat” x2 “sat” h2 y2 x3...

117
High Level Computer Vision Recurrent Neural Networks @ June 5, 2019 Bernt Schiele & Mario Fritz www.mpi-inf.mpg.de/hlcv/ Max Planck Institute for Informatics & Saarland University, Saarland Informatics Campus Saarbrücken

Upload: others

Post on 22-May-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

High Level Computer Vision

Recurrent Neural Networks@ June 5, 2019

Bernt Schiele & Mario Fritz

www.mpi-inf.mpg.de/hlcv/ Max Planck Institute for Informatics & Saarland University, Saarland Informatics Campus Saarbrücken

Page 2: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

High Level Computer Vision | Bernt Schiele & Mario Fritz

Overview Today’s Lecture

• Recurrent Neural Networks (RNNs) ‣ Motivation & flexibility of RNNs (some recap from last week)

‣ Language modeling - including “unreasonable effectiveness of RNNs”

‣ RNNs for image description / captioning

‣ Standard RNN and a particularly successful RNN: Long Short Term Memory (LSTM) - including “visualizations of RNN cells”

!2

Page 3: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

Recurrent Networks offer a lot of flexibility:

slidecredit:AndrejKarpathy

Page 4: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

Sequences in VisionSequences in the input… (many-to-one)

JumpingDancingFightingEating

Running

slidecredit:JeffDonahue

Page 5: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

Sequences in VisionSequences in the output… (one-to-many)

A happy brown dog.

slidecredit:JeffDonahue

Page 6: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

Sequences in VisionSequences everywhere! (many-to-many)

A dog jumps over a hurdle.

slidecredit:JeffDonahue

Page 7: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

ConvNets

Krizhevsky et al., NIPS 2012

slidecredit:JeffDonahue

Page 8: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

Problem #1

fixed-size, static input

224

224

slidecredit:JeffDonahue

Page 9: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

Problem #1

fixed-size, static input

224

224

???

slidecredit:JeffDonahue

Page 10: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

Problem #2

Krizhevsky et al., NIPS 2012

slidecredit:JeffDonahue

Page 11: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

output is a single choice from a fixed list of options

doghorsefishsnake

cat

Problem #2slidecredit:JeffDonahue

Page 12: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

output is a single choice from a fixed list of options

a happy brown doga big brown doga happy red doga big red dog

Problem #2

slidecredit:JeffDonahue

Page 13: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

Recurrent Networks offer a lot of flexibility:

slidecredit:AndrejKarpathy

Page 14: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

High Level Computer Vision | Bernt Schiele & Mario Fritz

Recurrent Neural Network (RNN)

!14

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

Page 15: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

High Level Computer Vision | Bernt Schiele & Mario Fritz

Recurrent Neural Network (RNN)

!15

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

Page 16: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

High Level Computer Vision | Bernt Schiele & Mario Fritz

Recurrent Neural Network (RNN)

!16

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

Page 17: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

High Level Computer Vision | Bernt Schiele & Mario Fritz

Recurrent Neural Network (RNN)

!17

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

Page 18: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

High Level Computer Vision | Bernt Schiele & Mario Fritz

(Simple) Recurrent Neural Network

!18

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

Page 19: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

High Level Computer Vision | Bernt Schiele & Mario Fritz

RNN: Computational Graph

!19

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

Page 20: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

High Level Computer Vision | Bernt Schiele & Mario Fritz

RNN: Computational Graph

!20

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

Page 21: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

High Level Computer Vision | Bernt Schiele & Mario Fritz

RNN: Computational Graph

!21

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

Page 22: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

High Level Computer Vision | Bernt Schiele & Mario Fritz

RNN: Computational Graph: Many to Many

!22

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

Page 23: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

High Level Computer Vision | Bernt Schiele & Mario Fritz

RNN: Computational Graph: Many to Many

!23

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

Page 24: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

High Level Computer Vision | Bernt Schiele & Mario Fritz !24

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

Page 25: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

High Level Computer Vision | Bernt Schiele & Mario Fritz

RNN: Computational Graph: Many to One

!25

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

Page 26: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

High Level Computer Vision | Bernt Schiele & Mario Fritz

RNN: Computational Graph: One to Many

!26

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

Page 27: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

High Level Computer Vision | Bernt Schiele & Mario Fritz

Sequence to Sequence

!27

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

Page 28: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

High Level Computer Vision | Bernt Schiele & Mario Fritz

Sequence to Sequence

!28

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

Page 29: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

Recurrent Networks offer a lot of flexibility:

slidecredit:AndrejKarpathy

Page 30: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

Language Models

Recurrent Neural Network Based Language Model [Tomas Mikolov, 2010]

Word-level language model. Similar to:

slidecredit:AndrejKarpathy

Page 31: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

Suppose we had the training sentence “cat sat on mat”

We want to train a language model: P(next word | previous words)

i.e. want these to be high: P(cat | [<S>]) P(sat | [<S>, cat]) P(on | [<S>, cat, sat]) P(mat | [<S>, cat, sat, on]) P(<E>| [<S>, cat, sat, on, mat])

slidecredit:AndrejKarpathy

Page 32: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

Suppose we had the training sentence “cat sat on mat”

We want to train a language model: P(next word | previous words)

First, suppose we had only a finite, 1-word history: i.e. want these to be high: P(cat | <S>) P(sat | cat) P(on | sat) P(mat | on) P(<E>| mat)

slidecredit:AndrejKarpathy

Page 33: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

h0

x0 <START>

y0

x1 “cat”

h1

y1

“cat sat on mat”

x2 “sat”

h2

y2

x3 “on”

h3

y3

x4 “mat”

h4

y4

300 (learnable) numbers associated with each word in vocabulary

slidecredit:AndrejKarpathy

Page 34: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

h0

x0 <START>

y0

x1 “cat”

h1

y1

“cat sat on mat”

x2 “sat”

h2

y2

x3 “on”

h3

y3

x4 “mat”

h4

y4

300 (learnable) numbers associated with each word in vocabulary

hidden layer (e.g. 500-D vectors) h4 = tanh(0, Wxh * x4)

slidecredit:AndrejKarpathy

Page 35: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

h0

x0 <START>

y0

x1 “cat”

h1

y1

“cat sat on mat”

x2 “sat”

h2

y2

x3 “on”

h3

y3

x4 “mat”

h4

y4

300 (learnable) numbers associated with each word in vocabulary

10,001-D class scores: Softmax over 10,000 words and a special <END> token. y4 = Why * h4

hidden layer (e.g. 500-D vectors) h4 = tanh(0, Wxh * x4)

slidecredit:AndrejKarpathy

Page 36: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

h0

x0 <START>

y0

x1 “cat”

h1

y1

“cat sat on mat”

x2 “sat”

h2

y2

x3 “on”

h3

y3

x4 “mat”

h4

y4

300 (learnable) numbers associated with each word in vocabulary

10,001-D class scores: Softmax over 10,000 words and a special <END> token. y4 = Why * h4

hidden layer (e.g. 500-D vectors) h4 = tanh(0, Wxh * x4 + Whh * h3)

Recurrent Neural Network:

slidecredit:AndrejKarpathy

Page 37: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

Training this on a lot of sentences would give us a language model. A way to predict

P(next word | previous words)

x0 <START>

slidecredit:AndrejKarpathy

Generating Sentences...

Page 38: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

Training this on a lot of sentences would give us a language model. A way to predict

P(next word | previous words) h0

x0 <START>

y0

slidecredit:AndrejKarpathy

Generating Sentences...

Page 39: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

Training this on a lot of sentences would give us a language model. A way to predict

P(next word | previous words) h0

x0 <START>

y0

x1 “cat”

sample!

slidecredit:AndrejKarpathy

Generating Sentences...

Page 40: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

Training this on a lot of sentences would give us a language model. A way to predict

P(next word | previous words) h0

x0 <START>

y0

x1 “cat”

h1

y1

slidecredit:AndrejKarpathy

Generating Sentences...

Page 41: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

Training this on a lot of sentences would give us a language model. A way to predict

P(next word | previous words) h0

x0 <START>

y0

x1 “cat”

h1

y1

x2 “sat”

sample!

slidecredit:AndrejKarpathy

Generating Sentences...

Page 42: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

Training this on a lot of sentences would give us a language model. A way to predict

P(next word | previous words) h0

x0 <START>

y0

x1 “cat”

h1

y1

x2 “sat”

h2

y2

slidecredit:AndrejKarpathy

Generating Sentences...

Page 43: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

Training this on a lot of sentences would give us a language model. A way to predict

P(next word | previous words) h0

x0 <START>

y0

x1 “cat”

h1

y1

x2 “sat”

h2

y2

x3 “on”

sample!

slidecredit:AndrejKarpathy

Generating Sentences...

Page 44: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

Training this on a lot of sentences would give us a language model. A way to predict

P(next word | previous words) h0

x0 <START>

y0

x1 “cat”

h1

y1

x2 “sat”

h2

y2

x3 “on”

h3

y3

slidecredit:AndrejKarpathy

Generating Sentences...

Page 45: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

Training this on a lot of sentences would give us a language model. A way to predict

P(next word | previous words) h0

x0 <START>

y0

x1 “cat”

h1

y1

x2 “sat”

h2

y2

x3 “on”

h3

y3

x4 “mat”

sample!

slidecredit:AndrejKarpathy

Generating Sentences...

Page 46: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

Training this on a lot of sentences would give us a language model. A way to predict

P(next word | previous words) h0

x0 <START>

y0

x1 “cat”

h1

y1

x2 “sat”

h2

y2

x3 “on”

h3

y3

x4 “mat”

h4

y4

slidecredit:AndrejKarpathy

Generating Sentences...

Page 47: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

Training this on a lot of sentences would give us a language model. A way to predict

P(next word | previous words) h0

x0 <START>

y0

x1 “cat”

h1

y1

x2 “sat”

h2

y2

x3 “on”

h3

y3

x4 “mat”

h4

y4

samples <END>? done.

slidecredit:AndrejKarpathy

Generating Sentences...

Page 48: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

High Level Computer Vision | Bernt Schiele & Mario Fritz

Example…

!48

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

Page 49: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

High Level Computer Vision | Bernt Schiele & Mario Fritz

Example…

!49

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

Page 50: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

High Level Computer Vision | Bernt Schiele & Mario Fritz

Example…

!50

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

Page 51: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

High Level Computer Vision | Bernt Schiele & Mario Fritz

Example…

!51

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

Page 52: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

High Level Computer Vision | Bernt Schiele & Mario Fritz

Example…

!52

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

Page 53: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

High Level Computer Vision | Bernt Schiele & Mario Fritz

Example…

!53

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

Page 54: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

High Level Computer Vision | Bernt Schiele & Mario Fritz

Example…

!54

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

Page 55: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

High Level Computer Vision | Bernt Schiele & Mario Fritz

Learning via Backpropagation…

!55

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

Page 56: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

High Level Computer Vision | Bernt Schiele & Mario Fritz

Learning via Backpropagation…

!56

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

Page 57: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

High Level Computer Vision | Bernt Schiele & Mario Fritz

Learning via Backpropagation…

!57

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

Page 58: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

High Level Computer Vision | Bernt Schiele & Mario Fritz

Learning via Backpropagation…

!58

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

Page 59: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

“The Unreasonable Effectiveness of Recurrent Neural Networks”

karpathy.github.io

slidecredit:AndrejKarpathy

Page 60: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

Character-level language model example

Vocabulary: [h,e,l,o]

Example training sequence: “hello”

slidecredit:AndrejKarpathy

Page 61: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

slidecredit:AndrejKarpathy

Page 62: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

slidecredit:AndrejKarpathy

Page 63: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

slidecredit:AndrejKarpathy

Page 64: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

slidecredit:AndrejKarpathy

Page 65: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

Try it yourself: char-rnn on Github (uses Torch7)

slidecredit:AndrejKarpathy

Page 66: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

Cooking Recipes

slidecredit:AndrejKarpathy

Page 67: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

Obama Speeches

slidecredit:AndrejKarpathy

Page 68: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

slidecredit:AndrejKarpathy

Page 69: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

slidecredit:AndrejKarpathy

Page 70: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

Learning from Linux Source Code

slidecredit:AndrejKarpathy

Page 71: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

slidecredit:AndrejKarpathy

Page 72: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

slidecredit:AndrejKarpathy

Page 73: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

slidecredit:AndrejKarpathy

Page 74: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

Yoav Goldberg n-gram experiments

Order 10 ngram model on Shakespeare:

slidecredit:AndrejKarpathy

Page 75: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

But on Linux:

slidecredit:AndrejKarpathy

Page 76: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

“straw hat”

training example

slidecredit:AndrejKarpathy

Page 77: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

“straw hat”

training example

slidecredit:AndrejKarpathy

Page 78: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

“straw hat”

training example

X

slidecredit:AndrejKarpathy

Page 79: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

“straw hat”

training example

X

h0

x0 <START>

y0

x1 “straw”

h1

y1

x2 “hat”

h2

y2

<START> straw hat

slidecredit:AndrejKarpathy

Page 80: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

“straw hat”

training example

X

h0

x0 <START>

y0

x1 “straw”

h1

y1

x2 “hat”

h2

y2

<START> straw hat

before: h0 = tanh(0, Wxh * x0)

now: h0 = tanh(0, Wxh * x0 + Wih * v)

slidecredit:AndrejKarpathy

Page 81: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

test image

slidecredit:AndrejKarpathy

Page 82: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

test image

x0 <START>

<START>

slidecredit:AndrejKarpathy

Page 83: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

h0

x0 <START>

y0

<START>

test image

slidecredit:AndrejKarpathy

Page 84: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

h0

x0 <START>

y0

<START>

test image

straw

sample!

slidecredit:AndrejKarpathy

Page 85: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

h0

x0 <START>

y0

<START>

test image

straw

h1

y1

slidecredit:AndrejKarpathy

Page 86: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

h0

x0 <START>

y0

<START>

test image

straw

h1

y1

hat

sample!

slidecredit:AndrejKarpathy

Page 87: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

h0

x0 <START>

y0

<START>

test image

straw

h1

y1

hat

h2

y2

slidecredit:AndrejKarpathy

Page 88: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

h0

x0 <START>

y0

<START>

test image

straw

h1

y1

hat

h2

y2

sample! <END> token => finish.

slidecredit:AndrejKarpathy

Page 89: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

Wow I can’t believe that worked

slidecredit:JeffDonahue

Page 90: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

Wow I can’t believe that worked

slidecredit:JeffDonahue

Page 91: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

Well, I can kind of see it

slidecredit:JeffDonahue

Page 92: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

Well, I can kind of see it

slidecredit:JeffDonahue

Page 93: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

Not sure what happened there...

slidecredit:JeffDonahue

Page 94: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

High Level Computer Vision | Bernt Schiele & Mario Fritz

Vanilla RNN...

!94

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

Page 95: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

High Level Computer Vision | Bernt Schiele & Mario Fritz

Vanilla RNN...

!95

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

Page 96: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

High Level Computer Vision | Bernt Schiele & Mario Fritz

Vanilla RNN...

!96

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

Page 97: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

High Level Computer Vision | Bernt Schiele & Mario Fritz

Vanilla RNN...

!97

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

Page 98: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

High Level Computer Vision | Bernt Schiele & Mario Fritz

Vanilla RNN...

!98

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

Page 99: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

High Level Computer Vision | Bernt Schiele & Mario Fritz

Vanilla RNN...

!99

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

Page 100: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

High Level Computer Vision | Bernt Schiele & Mario Fritz

Long Short Term Memory (LSTM)

!100

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

Page 101: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

High Level Computer Vision | Bernt Schiele & Mario Fritz

Long Short Term Memory (LSTM)

!101

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

Page 102: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

High Level Computer Vision | Bernt Schiele & Mario Fritz

Long Short Term Memory (LSTM)

!102

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

Page 103: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

High Level Computer Vision | Bernt Schiele & Mario Fritz

Long Short Term Memory (LSTM): Gradient Flow

!103

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

Page 104: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

High Level Computer Vision | Bernt Schiele & Mario Fritz

Long Short Term Memory (LSTM): Gradient Flow

!104

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

Page 105: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

High Level Computer Vision | Bernt Schiele & Mario Fritz

Long Short Term Memory (LSTM): Gradient Flow

!105

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

Page 106: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

High Level Computer Vision | Bernt Schiele & Mario Fritz

Long Short Term Memory (LSTM): Gradient Flow

!106

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

Page 107: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

High Level Computer Vision | Bernt Schiele & Mario Fritz

Alternatives

!107

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

Page 108: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

High Level Computer Vision | Bernt Schiele & Mario Fritz

Summary

!108

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

Page 109: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

Visualizing and Understanding Recurrent Networks Andrej Karpathy*, Justin Johnson*, Li Fei-Fei (on arXiv.org)

slidecredit:AndrejKarpathy

Page 110: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

Hunting interpretable cells

Page 111: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

Hunting interpretable cells

slidecredit:AndrejKarpathy

Page 112: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

Hunting interpretable cells

quote detection cell

slidecredit:AndrejKarpathy

Page 113: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

Hunting interpretable cells

line length tracking cell

slidecredit:AndrejKarpathy

Page 114: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

Hunting interpretable cells

if statement cell

slidecredit:AndrejKarpathy

Page 115: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

Hunting interpretable cells

quote/comment cell

slidecredit:AndrejKarpathy

Page 116: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

Hunting interpretable cells

code depth cell

slidecredit:AndrejKarpathy

Page 117: High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3 “on” h3 y3 x4 “mat” h4 y4 300 (learnable) numbers associated with each word in

Hunting interpretable cells

something interesting cell (not quite sure what)

slidecredit:AndrejKarpathy