![Page 1: Recurrent Neural Networkschrome.ws.dei.polimi.it/...RecurrentNeuralNetworks.pdf · - Recurrent Neural Networks - ... Long Short Term Memory Forget gate LSTM. 22 Long Short Term Memory](https://reader030.vdocuments.site/reader030/viewer/2022040306/5ec67765e690c36f2e144a81/html5/thumbnails/1.jpg)
Deep Learning: Theory, Techniques & Applications- Recurrent Neural Networks -
Prof. Matteo Matteucci – [email protected]
Department of Electronics, Information and BioengineeringArtificial Intelligence and Robotics Lab - Politecnico di Milano
![Page 2: Recurrent Neural Networkschrome.ws.dei.polimi.it/...RecurrentNeuralNetworks.pdf · - Recurrent Neural Networks - ... Long Short Term Memory Forget gate LSTM. 22 Long Short Term Memory](https://reader030.vdocuments.site/reader030/viewer/2022040306/5ec67765e690c36f2e144a81/html5/thumbnails/2.jpg)
2
Xt
Sequence Modeling
So far we have considered only «static» datasets
x1
xI
xi
…
…
1
𝑔1 𝑥 w𝑤𝑗𝑖
𝑤11
𝑤𝐽𝐼
𝑤10
… … …
…
1 1
𝑔𝐾 𝑥 w
1
![Page 3: Recurrent Neural Networkschrome.ws.dei.polimi.it/...RecurrentNeuralNetworks.pdf · - Recurrent Neural Networks - ... Long Short Term Memory Forget gate LSTM. 22 Long Short Term Memory](https://reader030.vdocuments.site/reader030/viewer/2022040306/5ec67765e690c36f2e144a81/html5/thumbnails/3.jpg)
3
Sequence Modeling
So far we have considered only «static» datasets
X0
x1
xI
xi
…
…
X1
x1
xI
xi
…
…
X2
x1
xI
xi
…
…
X3
x1
xI
xi
…
…
Xt
x1
xI
xi
…
…
…
time
…
![Page 4: Recurrent Neural Networkschrome.ws.dei.polimi.it/...RecurrentNeuralNetworks.pdf · - Recurrent Neural Networks - ... Long Short Term Memory Forget gate LSTM. 22 Long Short Term Memory](https://reader030.vdocuments.site/reader030/viewer/2022040306/5ec67765e690c36f2e144a81/html5/thumbnails/4.jpg)
4
Sequence Modeling
Different ways to deal with «dynamic» data:
Memoryless models:
• Autoregressive models
• Feedforward neural networks
Models with memory:
• Linear dynamical systems
• Hidden Markov models
• Recurrent Neural Networks
• ...
X0
x1
xI
xi
…
…
X1
x1
xI
xi
…
…
X2
x1
xI
xi
…
…
X3
x1
xI
xi
…
…
Xt
x1
xI
xi
…
…
…
time
…
X0 X1 X2 X3 Xt… …
![Page 5: Recurrent Neural Networkschrome.ws.dei.polimi.it/...RecurrentNeuralNetworks.pdf · - Recurrent Neural Networks - ... Long Short Term Memory Forget gate LSTM. 22 Long Short Term Memory](https://reader030.vdocuments.site/reader030/viewer/2022040306/5ec67765e690c36f2e144a81/html5/thumbnails/5.jpg)
5
Memoryless Models for Sequences
Autoregressive models
• Predict the next input from
previous ones using «delay taps»
Feed forward neural networks
• Generalize autoregressive models
using non linear hidden layers
X0 X1 X2 X3 Xt… …
𝑊𝑡−1
𝑊𝑡−2
X0 X1 X2 X3 Xt… …
Hidden
𝑊𝑡𝑊𝑡−1𝑊𝑡−2
time
time
![Page 6: Recurrent Neural Networkschrome.ws.dei.polimi.it/...RecurrentNeuralNetworks.pdf · - Recurrent Neural Networks - ... Long Short Term Memory Forget gate LSTM. 22 Long Short Term Memory](https://reader030.vdocuments.site/reader030/viewer/2022040306/5ec67765e690c36f2e144a81/html5/thumbnails/6.jpg)
6
Dynamical Systems
Generative models with a real-valued hidden state which cannot be observed directly
• The hidden state has some dynamics possibly
affected by noise and produces the output
• To compute the output has to infer hidden state
• Input are treated as driving inputs
In linear dynamical systems this becomes:
• State continuous with Gaussian uncertainty
• Transformations are assumed to be linear
• State can be estimated using Kalman filtering
X0 X1 Xt…
Hid
den
Hid
den
Hid
den
…
Y0 Y1 Yt…
time
Stochastic systems ...
![Page 7: Recurrent Neural Networkschrome.ws.dei.polimi.it/...RecurrentNeuralNetworks.pdf · - Recurrent Neural Networks - ... Long Short Term Memory Forget gate LSTM. 22 Long Short Term Memory](https://reader030.vdocuments.site/reader030/viewer/2022040306/5ec67765e690c36f2e144a81/html5/thumbnails/7.jpg)
7
Dynamical Systems
Generative models with a real-valued hidden state which cannot be observed directly
• The hidden state has some dynamics possibly
affected by noise and produces the output
• To compute the output has to infer hidden state
• Input are treated as driving inputs
In hidden Markov models this becomes:
• State assumed to be discrete, state transitions
are stochastic (transition matrix)
• Output is a stochastic function of hidden states
• State can be estimated via Viterbi algorithm.
Hid
den
Hid
den
Hid
den
…
Y0 Y1 Yt…
time
Stochastic systems ...
![Page 8: Recurrent Neural Networkschrome.ws.dei.polimi.it/...RecurrentNeuralNetworks.pdf · - Recurrent Neural Networks - ... Long Short Term Memory Forget gate LSTM. 22 Long Short Term Memory](https://reader030.vdocuments.site/reader030/viewer/2022040306/5ec67765e690c36f2e144a81/html5/thumbnails/8.jpg)
8
Recurrent Neural networks
Introduce memory by recurrent connections:
• Distributed hidden state allows
to store a information efficiently
• Non-linear dynamics allows
complex hidden state updates
Deterministic systems ...
“With enough neurons and time, RNNs
can compute anything that can be
computed by a computer.”
(Computation Beyond the Turing Limit
Hava T. Siegelmann, 1995)…
𝑐𝐵𝑡−1
𝑐1𝑡−1
𝑐1𝑡 𝑥,W𝐵
1, VB(1)
𝑐𝐵𝑡 𝑥,W𝐵
1, VB(1)
x1
xI
xi
…
… 𝑔𝑡 𝑥 w
𝑤𝑗𝑖
𝑤11
𝑤𝐽𝐼
…
ℎ𝑗𝑡(𝑥,W(1),)
1
1
![Page 9: Recurrent Neural Networkschrome.ws.dei.polimi.it/...RecurrentNeuralNetworks.pdf · - Recurrent Neural Networks - ... Long Short Term Memory Forget gate LSTM. 22 Long Short Term Memory](https://reader030.vdocuments.site/reader030/viewer/2022040306/5ec67765e690c36f2e144a81/html5/thumbnails/9.jpg)
9
Recurrent Neural networks
Introduce memory by recurrent connections:
• Distributed hidden state allows
to store a information efficiently
• Non-linear dynamics allows
complex hidden state updates
…
𝑐𝐵𝑡−1
𝑐1𝑡−1
𝑐1𝑡 𝑥,W𝐵
1, VB(1)
𝑐𝐵𝑡 𝑥,W𝐵
1, VB(1)
x1
xI
xi
…
… 𝑔𝑡 𝑥 w
𝑤𝑗𝑖
𝑤11
𝑤𝐽𝐼
…
ℎ𝑗𝑡(𝑥,W 1 , 𝑉(1))
1
1
𝑔𝑡 𝑥𝑛|𝑤 = 𝑔
𝑗=0
𝐽
𝑤1𝑗(2)⋅ ℎ𝑗𝑡 ⋅ +
𝑏=0
𝐵
𝑣1𝑏(2)⋅ 𝑐𝑏𝑡 ⋅
ℎ𝑗𝑡 ⋅ = ℎ𝑗
𝑡
𝑗=0
𝐽
𝑤𝑗𝑖(1)⋅ 𝑥𝑖,𝑛 +
𝑏=0
𝐵
𝑣𝑗𝑏(1)⋅ 𝑐𝑏𝑡−1
𝑐𝑏𝑡 ⋅ = 𝑐𝑏
𝑡
𝑗=0
𝐽
𝑣𝑏𝑖(1)⋅ 𝑥𝑖,𝑛 +
𝑏′=0
𝐵
𝑣𝑏𝑏′(1)⋅ 𝑐𝑏′𝑡−1
![Page 10: Recurrent Neural Networkschrome.ws.dei.polimi.it/...RecurrentNeuralNetworks.pdf · - Recurrent Neural Networks - ... Long Short Term Memory Forget gate LSTM. 22 Long Short Term Memory](https://reader030.vdocuments.site/reader030/viewer/2022040306/5ec67765e690c36f2e144a81/html5/thumbnails/10.jpg)
10
Backpropagation Through Time
…
𝑐𝐵𝑡−1
𝑐1𝑡−1
𝑐1𝑡 𝑥,W𝐵
1, VB(1)
𝑐𝐵𝑡 𝑥,W𝐵
1, VB(1)
x1
xI
xi
…
… 𝑔𝑡 𝑥 w
𝑤𝑗𝑖
𝑤11
𝑤𝐽𝐼
…
ℎ𝑗𝑡(𝑥,W 1 , 𝑉 1 )
1
1
![Page 11: Recurrent Neural Networkschrome.ws.dei.polimi.it/...RecurrentNeuralNetworks.pdf · - Recurrent Neural Networks - ... Long Short Term Memory Forget gate LSTM. 22 Long Short Term Memory](https://reader030.vdocuments.site/reader030/viewer/2022040306/5ec67765e690c36f2e144a81/html5/thumbnails/11.jpg)
11
Backpropagation Through Time
x1
xI
xi
…
… 𝑔𝑡 𝑥 w
𝑤𝑗𝑖
𝑤11
𝑤𝐽𝐼
…
ℎ𝑗𝑡(𝑥,W 1 , 𝑉 1 )
1
1
…
𝑐1𝑡 𝑥,W𝐵
1, VB(1)
𝑐𝐵𝑡 𝑥,W𝐵
1, VB(1)
…
x1
xI
xi
…
1
…
…
x1
xI
xi
…
1
…
…
x1
xI
xi
…
1
…
…
All these weights should be the same.
![Page 12: Recurrent Neural Networkschrome.ws.dei.polimi.it/...RecurrentNeuralNetworks.pdf · - Recurrent Neural Networks - ... Long Short Term Memory Forget gate LSTM. 22 Long Short Term Memory](https://reader030.vdocuments.site/reader030/viewer/2022040306/5ec67765e690c36f2e144a81/html5/thumbnails/12.jpg)
12
Backpropagation Through Time
• Perform network unroll for U steps
• Initialize 𝑉, 𝑉𝐵 replicas to be the same
• Compute gradients and update replicas
with the average of their gradients
x1
xI
xi
…
…
𝑔𝑡 𝑥 w
𝑤𝑗𝑖
𝑤11
𝑤𝐽𝐼
…
ℎ𝑗𝑡(𝑥,W 1 , 𝑉 1 )
1
1
…
𝑐1𝑡 𝑥,W𝐵
1, VB(1)
𝑐𝐵𝑡 𝑥,W𝐵
1, VB(1)
… … … …
𝑉𝐵𝑡𝑉𝐵
𝑡−1𝑉𝐵𝑡−2𝑉𝐵
𝑡−3
𝑉 = 𝑉 − 𝜂 ⋅1
𝑈
0
𝑈−1
𝑉𝑡−𝑢 𝑉𝐵 = 𝑉𝐵 − 𝜂 ⋅1
𝑈
0
𝑈−1
𝑉𝐵𝑡−𝑢
![Page 13: Recurrent Neural Networkschrome.ws.dei.polimi.it/...RecurrentNeuralNetworks.pdf · - Recurrent Neural Networks - ... Long Short Term Memory Forget gate LSTM. 22 Long Short Term Memory](https://reader030.vdocuments.site/reader030/viewer/2022040306/5ec67765e690c36f2e144a81/html5/thumbnails/13.jpg)
13
How much should we go back in time?
Sometime the output might be related to
some input happened quite long before
However backpropagation through time was
not able to train recurrent neural networks
significantly back in time ...
…
𝑐𝐵𝑡−1
𝑐1𝑡−1
𝑐1𝑡 𝑥,W𝐵
1, VB(1)
𝑐𝐵𝑡 𝑥,W𝐵
1, VB(1)
x1
xI
xi
…
… 𝑔𝑡 𝑥 w
𝑤𝑗𝑖
𝑤11
𝑤𝐽𝐼
…
ℎ𝑗𝑡(𝑥,W 1 , 𝑉 1 )
1
1
Jane walked into the room. John walked in too.It was late in the day. Jane said hi to <???>
Was due to not being able to backprop through
many layers ...
![Page 14: Recurrent Neural Networkschrome.ws.dei.polimi.it/...RecurrentNeuralNetworks.pdf · - Recurrent Neural Networks - ... Long Short Term Memory Forget gate LSTM. 22 Long Short Term Memory](https://reader030.vdocuments.site/reader030/viewer/2022040306/5ec67765e690c36f2e144a81/html5/thumbnails/14.jpg)
14
How much can we go back in time?
To better understand why it was not working let consider a simplified case:
Backpropagation over an entire sequence is computed as
If we consider the norm of these terms
𝑥𝑦𝑡 = 𝑤(2)𝑔(ℎ𝑡)ℎ𝑡 = 𝑔(𝑣(1)ℎ𝑡−1 + 𝑤(1)𝑥)
𝜕𝐸
𝜕𝑤=
𝑡=1
𝑆𝜕𝐸𝑡
𝜕𝑤
𝜕𝐸𝑡
𝜕𝑤=
𝑡=1
𝑡𝜕𝐸𝑡
𝜕𝑦𝑡𝜕𝑦𝑡
𝜕ℎ𝑡𝜕ℎ𝑡
𝜕ℎ𝑘𝜕ℎ𝑘
𝜕𝑤
𝜕ℎ𝑡
𝜕ℎ𝑘=
𝑖=𝑘+1
𝑡𝜕ℎ𝑖𝜕ℎ𝑖−1=
𝑖=𝑘+1
𝑡
𝑣 1 𝑔′ ℎ𝑖−1
𝜕ℎ𝑖𝜕ℎ𝑖−1
= 𝑣 1 𝑔′ ℎ𝑖−1𝜕ℎ𝑡
𝜕ℎ𝑘≤ 𝛾𝑣 ⋅ 𝛾𝑔′
𝑡−𝑘
If (𝛾𝑣𝛾𝑔′ ) < 1this
converges to 0 ...
With Sigmoids and Tanh we have vanishing gradients
![Page 15: Recurrent Neural Networkschrome.ws.dei.polimi.it/...RecurrentNeuralNetworks.pdf · - Recurrent Neural Networks - ... Long Short Term Memory Forget gate LSTM. 22 Long Short Term Memory](https://reader030.vdocuments.site/reader030/viewer/2022040306/5ec67765e690c36f2e144a81/html5/thumbnails/15.jpg)
15
Dealing with Vanishing Gradient
Force all gradients to be either 0 or 1
Build Recurrent Neural Networks using small modules that are designed to remember
values for a long time.
𝑔 𝑎 = 𝑅𝑒𝐿𝑢 𝑎 = max 0, 𝑎
𝑔′ 𝑎 = 1𝑎>0
It only accumulates the input ...
𝑥𝑦𝑡 = 𝑤(2)𝑔(ℎ𝑡)ℎ𝑡 = 𝑣(1)ℎ𝑡−1 + 𝑤(1)𝑥
𝑣(1) = 1
![Page 16: Recurrent Neural Networkschrome.ws.dei.polimi.it/...RecurrentNeuralNetworks.pdf · - Recurrent Neural Networks - ... Long Short Term Memory Forget gate LSTM. 22 Long Short Term Memory](https://reader030.vdocuments.site/reader030/viewer/2022040306/5ec67765e690c36f2e144a81/html5/thumbnails/16.jpg)
16
Long-Short Term Memories
Hochreiter & Schmidhuber (1997) solved the problem of vanishing gradient designing a
memory cell using logistic and linear units with multiplicative interactions:
• Information gets into the cell
whenever its “write” gate is on.
• The information stays in the cell
so long as its “keep” gate is on.
• Information is read from the cell
by turning on its “read” gate.
Can backpropagate through this since the loop
has fixed weight.
![Page 17: Recurrent Neural Networkschrome.ws.dei.polimi.it/...RecurrentNeuralNetworks.pdf · - Recurrent Neural Networks - ... Long Short Term Memory Forget gate LSTM. 22 Long Short Term Memory](https://reader030.vdocuments.site/reader030/viewer/2022040306/5ec67765e690c36f2e144a81/html5/thumbnails/17.jpg)
17
RNN vs. LSTM
RNN
![Page 18: Recurrent Neural Networkschrome.ws.dei.polimi.it/...RecurrentNeuralNetworks.pdf · - Recurrent Neural Networks - ... Long Short Term Memory Forget gate LSTM. 22 Long Short Term Memory](https://reader030.vdocuments.site/reader030/viewer/2022040306/5ec67765e690c36f2e144a81/html5/thumbnails/18.jpg)
19
Long Short Term Memory
LSTM
![Page 19: Recurrent Neural Networkschrome.ws.dei.polimi.it/...RecurrentNeuralNetworks.pdf · - Recurrent Neural Networks - ... Long Short Term Memory Forget gate LSTM. 22 Long Short Term Memory](https://reader030.vdocuments.site/reader030/viewer/2022040306/5ec67765e690c36f2e144a81/html5/thumbnails/19.jpg)
20
Long Short Term Memory
Input gate
LSTM
![Page 20: Recurrent Neural Networkschrome.ws.dei.polimi.it/...RecurrentNeuralNetworks.pdf · - Recurrent Neural Networks - ... Long Short Term Memory Forget gate LSTM. 22 Long Short Term Memory](https://reader030.vdocuments.site/reader030/viewer/2022040306/5ec67765e690c36f2e144a81/html5/thumbnails/20.jpg)
21
Long Short Term Memory
Forget gate
LSTM
![Page 21: Recurrent Neural Networkschrome.ws.dei.polimi.it/...RecurrentNeuralNetworks.pdf · - Recurrent Neural Networks - ... Long Short Term Memory Forget gate LSTM. 22 Long Short Term Memory](https://reader030.vdocuments.site/reader030/viewer/2022040306/5ec67765e690c36f2e144a81/html5/thumbnails/21.jpg)
22
Long Short Term Memory
Memory gate
LSTM
![Page 22: Recurrent Neural Networkschrome.ws.dei.polimi.it/...RecurrentNeuralNetworks.pdf · - Recurrent Neural Networks - ... Long Short Term Memory Forget gate LSTM. 22 Long Short Term Memory](https://reader030.vdocuments.site/reader030/viewer/2022040306/5ec67765e690c36f2e144a81/html5/thumbnails/22.jpg)
23
Long Short Term Memory
Output gate
LSTM
![Page 23: Recurrent Neural Networkschrome.ws.dei.polimi.it/...RecurrentNeuralNetworks.pdf · - Recurrent Neural Networks - ... Long Short Term Memory Forget gate LSTM. 22 Long Short Term Memory](https://reader030.vdocuments.site/reader030/viewer/2022040306/5ec67765e690c36f2e144a81/html5/thumbnails/23.jpg)
24
Gated Recurrent Unit (GRU)
It combines the forget and input gates into a single “update gate.” It also merges the
cell state and hidden state, and makes some other changes.
![Page 24: Recurrent Neural Networkschrome.ws.dei.polimi.it/...RecurrentNeuralNetworks.pdf · - Recurrent Neural Networks - ... Long Short Term Memory Forget gate LSTM. 22 Long Short Term Memory](https://reader030.vdocuments.site/reader030/viewer/2022040306/5ec67765e690c36f2e144a81/html5/thumbnails/24.jpg)
25
LSTM Networks
You can build a computation graph with continuous transformations.
X0 X1 Xt…
Hid
den
Hid
den
Hid
den…
Y0 Y1 Yt…
![Page 25: Recurrent Neural Networkschrome.ws.dei.polimi.it/...RecurrentNeuralNetworks.pdf · - Recurrent Neural Networks - ... Long Short Term Memory Forget gate LSTM. 22 Long Short Term Memory](https://reader030.vdocuments.site/reader030/viewer/2022040306/5ec67765e690c36f2e144a81/html5/thumbnails/25.jpg)
26
LSTM Networks
You can build a computation graph with continuous transformations.
X0 X1 Xt…
Hid
den
Hid
den
Hid
den…
Y0 Y1 Yt…
![Page 26: Recurrent Neural Networkschrome.ws.dei.polimi.it/...RecurrentNeuralNetworks.pdf · - Recurrent Neural Networks - ... Long Short Term Memory Forget gate LSTM. 22 Long Short Term Memory](https://reader030.vdocuments.site/reader030/viewer/2022040306/5ec67765e690c36f2e144a81/html5/thumbnails/26.jpg)
27
LSTM Networks
You can build a computation graph with continuous transformations.
X0 X1 Xt…
LSTM
LSTM
LSTM
…
Y0 Y1 Yt…
LSTM
LSTM
LSTM
ReL
u
ReL
u
ReL
u
![Page 27: Recurrent Neural Networkschrome.ws.dei.polimi.it/...RecurrentNeuralNetworks.pdf · - Recurrent Neural Networks - ... Long Short Term Memory Forget gate LSTM. 22 Long Short Term Memory](https://reader030.vdocuments.site/reader030/viewer/2022040306/5ec67765e690c36f2e144a81/html5/thumbnails/27.jpg)
28
Sequential Data Problems
Fixed-sized
input
to fixed-sized
output
(e.g. image
classification)
Sequence output
(e.g. image captioning
takes an image and
outputs a sentence of
words).
Sequence input (e.g.
sentiment analysis
where a given sentence
is classified as
expressing positive or
negative sentiment).
Sequence input and sequence output (e.g. Machine Translation: an RNN reads a sentence in English and then outputs a sentence in French)
Synced sequence input and output (e.g. video classification where we wish to label each frame of the video)
Credits: Andrej Karpathy
![Page 28: Recurrent Neural Networkschrome.ws.dei.polimi.it/...RecurrentNeuralNetworks.pdf · - Recurrent Neural Networks - ... Long Short Term Memory Forget gate LSTM. 22 Long Short Term Memory](https://reader030.vdocuments.site/reader030/viewer/2022040306/5ec67765e690c36f2e144a81/html5/thumbnails/28.jpg)
29
Sequence to Sequence Modeling
Given <S, T> pairs, read S, and output T’ that best matches T
![Page 29: Recurrent Neural Networkschrome.ws.dei.polimi.it/...RecurrentNeuralNetworks.pdf · - Recurrent Neural Networks - ... Long Short Term Memory Forget gate LSTM. 22 Long Short Term Memory](https://reader030.vdocuments.site/reader030/viewer/2022040306/5ec67765e690c36f2e144a81/html5/thumbnails/29.jpg)
30
Tips & Tricks
When conditioning on a full input sequence Bidirectional RNNs can exploit it:
• Have one RNNs traverse the sequence left-to-right
• Have another RNN traverse the sequence right-to-left
• Use concatenation of hidden layers as feature representation
When initializing RNN we need to specify the initial state
• Could initialize them to a fixed value (such as 0)
• Better to treat the initial state as learned parameters
• Start off with random guesses of the initial state values
• Backpropagate the prediction error through time all the way to the initial
state values and compute the gradient of the error with respect to these
• Update these parameters by gradient descent
![Page 30: Recurrent Neural Networkschrome.ws.dei.polimi.it/...RecurrentNeuralNetworks.pdf · - Recurrent Neural Networks - ... Long Short Term Memory Forget gate LSTM. 22 Long Short Term Memory](https://reader030.vdocuments.site/reader030/viewer/2022040306/5ec67765e690c36f2e144a81/html5/thumbnails/30.jpg)
31
Acknowledgements
This slides are highly based on material taken from:
• Jeoffrey Hinton
• Hugo Larochelle
• Andrej Karpathy
• Nando De Freitas
• Chris Olah
You can find more details on the original slides
The amazing images on LSTM cell are taken from Chris Hola’s blog:
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
31
![Page 31: Recurrent Neural Networkschrome.ws.dei.polimi.it/...RecurrentNeuralNetworks.pdf · - Recurrent Neural Networks - ... Long Short Term Memory Forget gate LSTM. 22 Long Short Term Memory](https://reader030.vdocuments.site/reader030/viewer/2022040306/5ec67765e690c36f2e144a81/html5/thumbnails/31.jpg)
Deep Learning: Theory, Techniques & Applications- Recurrent Neural Networks -
Prof. Matteo Matteucci – [email protected]
Department of Electronics, Information and BioengineeringArtificial Intelligence and Robotics Lab - Politecnico di Milano