recurrent neural networks · recurrent neural networks 11-785 / 2020 spring / recitation 7 vedant...
TRANSCRIPT
![Page 1: Recurrent Neural Networks · Recurrent Neural Networks 11-785 / 2020 Spring / Recitation 7 Vedant Sanil, David Park “Drop your RNN and LSTM, they are no good!” The fall of RNN](https://reader033.vdocuments.site/reader033/viewer/2022042220/5ec67a23db0d1917dc626bce/html5/thumbnails/1.jpg)
Recurrent Neural Networks11-785 / 2020 Spring / Recitation 7
Vedant Sanil, David Park
“Drop your RNN and LSTM, they are no good!”The fall of RNN / LSTM, Eugenio Culurciello
Wise words to live by indeed
![Page 2: Recurrent Neural Networks · Recurrent Neural Networks 11-785 / 2020 Spring / Recitation 7 Vedant Sanil, David Park “Drop your RNN and LSTM, they are no good!” The fall of RNN](https://reader033.vdocuments.site/reader033/viewer/2022042220/5ec67a23db0d1917dc626bce/html5/thumbnails/2.jpg)
Content
• 1 Language Model
• 2 RNNs in PyTorch
• 3 Training RNNs
• 4 Generation with an RNN
• 5 Variable length inputs
![Page 3: Recurrent Neural Networks · Recurrent Neural Networks 11-785 / 2020 Spring / Recitation 7 Vedant Sanil, David Park “Drop your RNN and LSTM, they are no good!” The fall of RNN](https://reader033.vdocuments.site/reader033/viewer/2022042220/5ec67a23db0d1917dc626bce/html5/thumbnails/3.jpg)
A recurrent neural network and the unfolding in time of the computation involved in its forward computation.
![Page 4: Recurrent Neural Networks · Recurrent Neural Networks 11-785 / 2020 Spring / Recitation 7 Vedant Sanil, David Park “Drop your RNN and LSTM, they are no good!” The fall of RNN](https://reader033.vdocuments.site/reader033/viewer/2022042220/5ec67a23db0d1917dc626bce/html5/thumbnails/4.jpg)
RNNs Are Hard to TrainWhat isn’t? I had to spend a weektraining an MLP :(
![Page 5: Recurrent Neural Networks · Recurrent Neural Networks 11-785 / 2020 Spring / Recitation 7 Vedant Sanil, David Park “Drop your RNN and LSTM, they are no good!” The fall of RNN](https://reader033.vdocuments.site/reader033/viewer/2022042220/5ec67a23db0d1917dc626bce/html5/thumbnails/5.jpg)
Different Tasks
Each rectangle is a vector and arrows represent functions (e.g. matrix multiply). Input vectors are in red, output vectors are in blue and green vectors hold the RNN's state (more on this later)
![Page 6: Recurrent Neural Networks · Recurrent Neural Networks 11-785 / 2020 Spring / Recitation 7 Vedant Sanil, David Park “Drop your RNN and LSTM, they are no good!” The fall of RNN](https://reader033.vdocuments.site/reader033/viewer/2022042220/5ec67a23db0d1917dc626bce/html5/thumbnails/6.jpg)
Different Tasks
One-to-OneA simple MLP, no recurrence
One-To-ManyExample, Image Captioning: Have a single
image, generate a sequence of words
![Page 7: Recurrent Neural Networks · Recurrent Neural Networks 11-785 / 2020 Spring / Recitation 7 Vedant Sanil, David Park “Drop your RNN and LSTM, they are no good!” The fall of RNN](https://reader033.vdocuments.site/reader033/viewer/2022042220/5ec67a23db0d1917dc626bce/html5/thumbnails/7.jpg)
Different Tasks
Many-to-OneExample, Sentiment analysis: Given a sentence, classify if its sentiment as positive or negative
Many-To-ManyExample, Machine Translation: Have an input sentence
in one language, convert it to another language
A history of machine translations from Cold War to Deep Learning:https://www.freecodecamp.org/news/a-history-of-machine-translation-from-the-cold-war-to-deep-learning-f1d335ce8b5/
![Page 8: Recurrent Neural Networks · Recurrent Neural Networks 11-785 / 2020 Spring / Recitation 7 Vedant Sanil, David Park “Drop your RNN and LSTM, they are no good!” The fall of RNN](https://reader033.vdocuments.site/reader033/viewer/2022042220/5ec67a23db0d1917dc626bce/html5/thumbnails/8.jpg)
Language Models
• Goal: predict the “probability of a sentence” P(E)
• How likely it is to be an actual sentence
Credits to cs224n
A lot of jargon, to basically say:
![Page 9: Recurrent Neural Networks · Recurrent Neural Networks 11-785 / 2020 Spring / Recitation 7 Vedant Sanil, David Park “Drop your RNN and LSTM, they are no good!” The fall of RNN](https://reader033.vdocuments.site/reader033/viewer/2022042220/5ec67a23db0d1917dc626bce/html5/thumbnails/9.jpg)
An RNN Language Model
![Page 10: Recurrent Neural Networks · Recurrent Neural Networks 11-785 / 2020 Spring / Recitation 7 Vedant Sanil, David Park “Drop your RNN and LSTM, they are no good!” The fall of RNN](https://reader033.vdocuments.site/reader033/viewer/2022042220/5ec67a23db0d1917dc626bce/html5/thumbnails/10.jpg)
RNN module in Pytorch
![Page 11: Recurrent Neural Networks · Recurrent Neural Networks 11-785 / 2020 Spring / Recitation 7 Vedant Sanil, David Park “Drop your RNN and LSTM, they are no good!” The fall of RNN](https://reader033.vdocuments.site/reader033/viewer/2022042220/5ec67a23db0d1917dc626bce/html5/thumbnails/11.jpg)
RNN modules in Pytorch
• Important: the outputs are exactly the hidden states of the final layer. Hence if the model returns y, h:
• y: (seq_len, batch, num_directions * hidden_size)
• h: (num_layers * num_directions, batch, hidden_size)
• Lets num_directions = 1, y[-1] == h[-1]
• LSTM and GRU:
• (h_n, c_n): (hidden_states, memory_cell)
![Page 12: Recurrent Neural Networks · Recurrent Neural Networks 11-785 / 2020 Spring / Recitation 7 Vedant Sanil, David Park “Drop your RNN and LSTM, they are no good!” The fall of RNN](https://reader033.vdocuments.site/reader033/viewer/2022042220/5ec67a23db0d1917dc626bce/html5/thumbnails/12.jpg)
Can be better visualized as,
![Page 13: Recurrent Neural Networks · Recurrent Neural Networks 11-785 / 2020 Spring / Recitation 7 Vedant Sanil, David Park “Drop your RNN and LSTM, they are no good!” The fall of RNN](https://reader033.vdocuments.site/reader033/viewer/2022042220/5ec67a23db0d1917dc626bce/html5/thumbnails/13.jpg)
RNN cell
![Page 14: Recurrent Neural Networks · Recurrent Neural Networks 11-785 / 2020 Spring / Recitation 7 Vedant Sanil, David Park “Drop your RNN and LSTM, they are no good!” The fall of RNN](https://reader033.vdocuments.site/reader033/viewer/2022042220/5ec67a23db0d1917dc626bce/html5/thumbnails/14.jpg)
Embedding Layers
![Page 15: Recurrent Neural Networks · Recurrent Neural Networks 11-785 / 2020 Spring / Recitation 7 Vedant Sanil, David Park “Drop your RNN and LSTM, they are no good!” The fall of RNN](https://reader033.vdocuments.site/reader033/viewer/2022042220/5ec67a23db0d1917dc626bce/html5/thumbnails/15.jpg)
Training a Language Model
![Page 16: Recurrent Neural Networks · Recurrent Neural Networks 11-785 / 2020 Spring / Recitation 7 Vedant Sanil, David Park “Drop your RNN and LSTM, they are no good!” The fall of RNN](https://reader033.vdocuments.site/reader033/viewer/2022042220/5ec67a23db0d1917dc626bce/html5/thumbnails/16.jpg)
Evaluate your model
![Page 17: Recurrent Neural Networks · Recurrent Neural Networks 11-785 / 2020 Spring / Recitation 7 Vedant Sanil, David Park “Drop your RNN and LSTM, they are no good!” The fall of RNN](https://reader033.vdocuments.site/reader033/viewer/2022042220/5ec67a23db0d1917dc626bce/html5/thumbnails/17.jpg)
Generation
![Page 18: Recurrent Neural Networks · Recurrent Neural Networks 11-785 / 2020 Spring / Recitation 7 Vedant Sanil, David Park “Drop your RNN and LSTM, they are no good!” The fall of RNN](https://reader033.vdocuments.site/reader033/viewer/2022042220/5ec67a23db0d1917dc626bce/html5/thumbnails/18.jpg)
Greedy Search, Random Search and Beam Search1. Greedy search: select the most likely word
2. Random Search: sample a word from the distribution
3. Beam Search: keep the n best words at each step, n is the beam size
![Page 19: Recurrent Neural Networks · Recurrent Neural Networks 11-785 / 2020 Spring / Recitation 7 Vedant Sanil, David Park “Drop your RNN and LSTM, they are no good!” The fall of RNN](https://reader033.vdocuments.site/reader033/viewer/2022042220/5ec67a23db0d1917dc626bce/html5/thumbnails/19.jpg)
Are we done?
![Page 20: Recurrent Neural Networks · Recurrent Neural Networks 11-785 / 2020 Spring / Recitation 7 Vedant Sanil, David Park “Drop your RNN and LSTM, they are no good!” The fall of RNN](https://reader033.vdocuments.site/reader033/viewer/2022042220/5ec67a23db0d1917dc626bce/html5/thumbnails/20.jpg)
• No.
![Page 21: Recurrent Neural Networks · Recurrent Neural Networks 11-785 / 2020 Spring / Recitation 7 Vedant Sanil, David Park “Drop your RNN and LSTM, they are no good!” The fall of RNN](https://reader033.vdocuments.site/reader033/viewer/2022042220/5ec67a23db0d1917dc626bce/html5/thumbnails/21.jpg)
How to train a LM: fixed length
![Page 22: Recurrent Neural Networks · Recurrent Neural Networks 11-785 / 2020 Spring / Recitation 7 Vedant Sanil, David Park “Drop your RNN and LSTM, they are no good!” The fall of RNN](https://reader033.vdocuments.site/reader033/viewer/2022042220/5ec67a23db0d1917dc626bce/html5/thumbnails/22.jpg)
Limits of fixed-length inputs
![Page 23: Recurrent Neural Networks · Recurrent Neural Networks 11-785 / 2020 Spring / Recitation 7 Vedant Sanil, David Park “Drop your RNN and LSTM, they are no good!” The fall of RNN](https://reader033.vdocuments.site/reader033/viewer/2022042220/5ec67a23db0d1917dc626bce/html5/thumbnails/23.jpg)
How to train a LM: Variable Length• Your dataset is now a list of N sequences of different lengths
• The input has a fixed dimension (seq_len, batch, input_size)
• How could we deal with this situation?
1. pad_sequence
2. Packed sequence
![Page 24: Recurrent Neural Networks · Recurrent Neural Networks 11-785 / 2020 Spring / Recitation 7 Vedant Sanil, David Park “Drop your RNN and LSTM, they are no good!” The fall of RNN](https://reader033.vdocuments.site/reader033/viewer/2022042220/5ec67a23db0d1917dc626bce/html5/thumbnails/24.jpg)
pad_sequence
Notice how seq_len is different, but input size is same
![Page 25: Recurrent Neural Networks · Recurrent Neural Networks 11-785 / 2020 Spring / Recitation 7 Vedant Sanil, David Park “Drop your RNN and LSTM, they are no good!” The fall of RNN](https://reader033.vdocuments.site/reader033/viewer/2022042220/5ec67a23db0d1917dc626bce/html5/thumbnails/25.jpg)
torch.Size([4, 3, 2])
(seq_len, batch, input_size)
pad_sequence
![Page 26: Recurrent Neural Networks · Recurrent Neural Networks 11-785 / 2020 Spring / Recitation 7 Vedant Sanil, David Park “Drop your RNN and LSTM, they are no good!” The fall of RNN](https://reader033.vdocuments.site/reader033/viewer/2022042220/5ec67a23db0d1917dc626bce/html5/thumbnails/26.jpg)
pack_sequence
packed_2 = pack_sequence([x1,x2,x3], enforce_sorted=True) packed = pack_sequence([x1,x2,x3], enforce_sorted=False)
![Page 27: Recurrent Neural Networks · Recurrent Neural Networks 11-785 / 2020 Spring / Recitation 7 Vedant Sanil, David Park “Drop your RNN and LSTM, they are no good!” The fall of RNN](https://reader033.vdocuments.site/reader033/viewer/2022042220/5ec67a23db0d1917dc626bce/html5/thumbnails/27.jpg)
pack_padded_sequence and pad_packed_sequence
![Page 28: Recurrent Neural Networks · Recurrent Neural Networks 11-785 / 2020 Spring / Recitation 7 Vedant Sanil, David Park “Drop your RNN and LSTM, they are no good!” The fall of RNN](https://reader033.vdocuments.site/reader033/viewer/2022042220/5ec67a23db0d1917dc626bce/html5/thumbnails/28.jpg)
Why is the batch_size = tensor([3, 2, 2, 1]) here?
batch_sizes (Tensor): Tensor of integers holding information about the batch size at each sequence stepFor instance, given data ``abc`` and ``x`` the:class:`PackedSequence` would contain data ``axbc`` with``batch_sizes=[2,1,1]``.
![Page 29: Recurrent Neural Networks · Recurrent Neural Networks 11-785 / 2020 Spring / Recitation 7 Vedant Sanil, David Park “Drop your RNN and LSTM, they are no good!” The fall of RNN](https://reader033.vdocuments.site/reader033/viewer/2022042220/5ec67a23db0d1917dc626bce/html5/thumbnails/29.jpg)
Packed Sequences and RNNs
• Packed sequences are on the same device as the padded sequence
• Packed sequences could help your RNNs know the length for each instance
![Page 30: Recurrent Neural Networks · Recurrent Neural Networks 11-785 / 2020 Spring / Recitation 7 Vedant Sanil, David Park “Drop your RNN and LSTM, they are no good!” The fall of RNN](https://reader033.vdocuments.site/reader033/viewer/2022042220/5ec67a23db0d1917dc626bce/html5/thumbnails/30.jpg)
MLP, RNN, CNN, Transformer• All these layers are just features extractors
• Temporal convolutional network (TCN) “outperform canonical recurrent networks such as LSTMs across a diverse range of tasks and datasets, while demonstrating longer effective memory”
(An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling)
![Page 31: Recurrent Neural Networks · Recurrent Neural Networks 11-785 / 2020 Spring / Recitation 7 Vedant Sanil, David Park “Drop your RNN and LSTM, they are no good!” The fall of RNN](https://reader033.vdocuments.site/reader033/viewer/2022042220/5ec67a23db0d1917dc626bce/html5/thumbnails/31.jpg)
![Page 32: Recurrent Neural Networks · Recurrent Neural Networks 11-785 / 2020 Spring / Recitation 7 Vedant Sanil, David Park “Drop your RNN and LSTM, they are no good!” The fall of RNN](https://reader033.vdocuments.site/reader033/viewer/2022042220/5ec67a23db0d1917dc626bce/html5/thumbnails/32.jpg)
Is that enough?
• RNN, LSTM, GRU
• Transformer (would be covered more in the future lectures and recitations)
• CNN
Credits: Attention is all you need
![Page 33: Recurrent Neural Networks · Recurrent Neural Networks 11-785 / 2020 Spring / Recitation 7 Vedant Sanil, David Park “Drop your RNN and LSTM, they are no good!” The fall of RNN](https://reader033.vdocuments.site/reader033/viewer/2022042220/5ec67a23db0d1917dc626bce/html5/thumbnails/33.jpg)
Papers
• https://arxiv.org/pdf/1708.02182.pdf
• For more tricks about Regularizing and Optimizing LSTM Language Models
• Attention is all you need!
• https://towardsdatascience.com/the-fall-of-rnn-lstm-2d1594c74ce0
• LSTM song!!!