deep learning, echo state networks and the edge …bartak/ui_seminar/talks/2016...gradient descent a...
TRANSCRIPT
![Page 1: Deep Learning, Echo State Networks and the Edge …bartak/ui_seminar/talks/2016...gradient descent a method for function minimization idea: put a ball on the cost function surface](https://reader033.vdocuments.site/reader033/viewer/2022041718/5e4cb69bfac579765149f355/html5/thumbnails/1.jpg)
Deep Learning, Echo State Networks and the Edge of Chaos
Artificial Intelligence Seminar 1. 11. 2016Filip Matzner, František Mráz
![Page 2: Deep Learning, Echo State Networks and the Edge …bartak/ui_seminar/talks/2016...gradient descent a method for function minimization idea: put a ball on the cost function surface](https://reader033.vdocuments.site/reader033/viewer/2022041718/5e4cb69bfac579765149f355/html5/thumbnails/2.jpg)
Outline
![Page 3: Deep Learning, Echo State Networks and the Edge …bartak/ui_seminar/talks/2016...gradient descent a method for function minimization idea: put a ball on the cost function surface](https://reader033.vdocuments.site/reader033/viewer/2022041718/5e4cb69bfac579765149f355/html5/thumbnails/3.jpg)
Outline
![Page 4: Deep Learning, Echo State Networks and the Edge …bartak/ui_seminar/talks/2016...gradient descent a method for function minimization idea: put a ball on the cost function surface](https://reader033.vdocuments.site/reader033/viewer/2022041718/5e4cb69bfac579765149f355/html5/thumbnails/4.jpg)
Neuron
artificialbiological
![Page 5: Deep Learning, Echo State Networks and the Edge …bartak/ui_seminar/talks/2016...gradient descent a method for function minimization idea: put a ball on the cost function surface](https://reader033.vdocuments.site/reader033/viewer/2022041718/5e4cb69bfac579765149f355/html5/thumbnails/5.jpg)
Network Architecturesbiological neural networks are recurrent
artificial recurrent networks are hard to train
⟹ feed-forward networks receive more attention
feed-forward recurrent
![Page 6: Deep Learning, Echo State Networks and the Edge …bartak/ui_seminar/talks/2016...gradient descent a method for function minimization idea: put a ball on the cost function surface](https://reader033.vdocuments.site/reader033/viewer/2022041718/5e4cb69bfac579765149f355/html5/thumbnails/6.jpg)
Outline
![Page 7: Deep Learning, Echo State Networks and the Edge …bartak/ui_seminar/talks/2016...gradient descent a method for function minimization idea: put a ball on the cost function surface](https://reader033.vdocuments.site/reader033/viewer/2022041718/5e4cb69bfac579765149f355/html5/thumbnails/7.jpg)
Feed-Forward Networks (Pre-2010)shallow = no more than a single hidden layer
sigmoidal activation function
mean squared error cost function
![Page 8: Deep Learning, Echo State Networks and the Edge …bartak/ui_seminar/talks/2016...gradient descent a method for function minimization idea: put a ball on the cost function surface](https://reader033.vdocuments.site/reader033/viewer/2022041718/5e4cb69bfac579765149f355/html5/thumbnails/8.jpg)
gradient descent
a method for function minimization
idea: put a ball on the cost function surface and let it roll down
Feed-Forward Supervised Training
it is too simple, where is the hatch?
in calculating the derivatives for each weight
cost
weights
![Page 9: Deep Learning, Echo State Networks and the Edge …bartak/ui_seminar/talks/2016...gradient descent a method for function minimization idea: put a ball on the cost function surface](https://reader033.vdocuments.site/reader033/viewer/2022041718/5e4cb69bfac579765149f355/html5/thumbnails/9.jpg)
Backpropagationa gradient descent method specialized on neural networks
efficiently propagates the error gradient from the output layer to the input layer
calculates all the derivatives in a single pass
![Page 10: Deep Learning, Echo State Networks and the Edge …bartak/ui_seminar/talks/2016...gradient descent a method for function minimization idea: put a ball on the cost function surface](https://reader033.vdocuments.site/reader033/viewer/2022041718/5e4cb69bfac579765149f355/html5/thumbnails/10.jpg)
Vanishing Gradientwhy the networks cannot not be deeper?
now plot the derivative of the sigmoid function
usually ,thus with each layer, the gradient exponentially decreases
problem first described by Bengio et al. [1994]this simple description taken from Nielsen [2015]
imagine a simple network with three hidden layers
![Page 11: Deep Learning, Echo State Networks and the Edge …bartak/ui_seminar/talks/2016...gradient descent a method for function minimization idea: put a ball on the cost function surface](https://reader033.vdocuments.site/reader033/viewer/2022041718/5e4cb69bfac579765149f355/html5/thumbnails/11.jpg)
Outline
![Page 12: Deep Learning, Echo State Networks and the Edge …bartak/ui_seminar/talks/2016...gradient descent a method for function minimization idea: put a ball on the cost function surface](https://reader033.vdocuments.site/reader033/viewer/2022041718/5e4cb69bfac579765149f355/html5/thumbnails/12.jpg)
Feed-Forward Neuroevolutionin the simplest possible scenario
consider the network weights to be a vector of real numbers
evolve the weights by the basic genetic algorithm
this does not work particularly well, specialized evolutionary algorithms exist
![Page 13: Deep Learning, Echo State Networks and the Edge …bartak/ui_seminar/talks/2016...gradient descent a method for function minimization idea: put a ball on the cost function surface](https://reader033.vdocuments.site/reader033/viewer/2022041718/5e4cb69bfac579765149f355/html5/thumbnails/13.jpg)
NEATone of the many existing neuroevolutionary algorithms
good for smaller networks, not as much for large ones
can evolve both feed-forward networks and recurrent networks
even though it is quite old, it provides a suitable baseline
Stanley and Miikkulainen [2002]
![Page 14: Deep Learning, Echo State Networks and the Edge …bartak/ui_seminar/talks/2016...gradient descent a method for function minimization idea: put a ball on the cost function surface](https://reader033.vdocuments.site/reader033/viewer/2022041718/5e4cb69bfac579765149f355/html5/thumbnails/14.jpg)
HyperNEATextension of NEAT, can evolve larger networks
good when the network contains a lot of regularities
Stanley et al. [2009]
![Page 15: Deep Learning, Echo State Networks and the Edge …bartak/ui_seminar/talks/2016...gradient descent a method for function minimization idea: put a ball on the cost function surface](https://reader033.vdocuments.site/reader033/viewer/2022041718/5e4cb69bfac579765149f355/html5/thumbnails/15.jpg)
Outline
![Page 16: Deep Learning, Echo State Networks and the Edge …bartak/ui_seminar/talks/2016...gradient descent a method for function minimization idea: put a ball on the cost function surface](https://reader033.vdocuments.site/reader033/viewer/2022041718/5e4cb69bfac579765149f355/html5/thumbnails/16.jpg)
Feed-Forward Networks (Post-2010)deep = more than two hidden layers
GPU implementations (100x speedup)
ReLU activation function
![Page 17: Deep Learning, Echo State Networks and the Edge …bartak/ui_seminar/talks/2016...gradient descent a method for function minimization idea: put a ball on the cost function surface](https://reader033.vdocuments.site/reader033/viewer/2022041718/5e4cb69bfac579765149f355/html5/thumbnails/17.jpg)
Cross-Entropy Cost Functionsaturated neurons learn slowly with MSE even though their error might be the largest one
why? because of the shape of the sigmoid function
Derivation at this point is small
it can be partially avoided using cross-entropy cost function
![Page 18: Deep Learning, Echo State Networks and the Edge …bartak/ui_seminar/talks/2016...gradient descent a method for function minimization idea: put a ball on the cost function surface](https://reader033.vdocuments.site/reader033/viewer/2022041718/5e4cb69bfac579765149f355/html5/thumbnails/18.jpg)
Softmax Activationanother way to address the learning slowdown, especially when combined with the cross-entropy cost function
emphasizes the neuron with the maximum activation, however, does not ignore the other neurons
can be thought of as a probabilistic distribution, because
![Page 19: Deep Learning, Echo State Networks and the Edge …bartak/ui_seminar/talks/2016...gradient descent a method for function minimization idea: put a ball on the cost function surface](https://reader033.vdocuments.site/reader033/viewer/2022041718/5e4cb69bfac579765149f355/html5/thumbnails/19.jpg)
Convolutional Layersapply the same convolutional kernel to all the pixels in the image
a way to share the weights between neurons ⟹ regularization
inspired by nature
![Page 20: Deep Learning, Echo State Networks and the Edge …bartak/ui_seminar/talks/2016...gradient descent a method for function minimization idea: put a ball on the cost function surface](https://reader033.vdocuments.site/reader033/viewer/2022041718/5e4cb69bfac579765149f355/html5/thumbnails/20.jpg)
Max Poolingsimilar to the convolution but only takes the maximum of the perception field
a good way to subsample the image (i.e., fewer parameters to train)
forces more succinct image representation (i.e., compression)
![Page 21: Deep Learning, Echo State Networks and the Edge …bartak/ui_seminar/talks/2016...gradient descent a method for function minimization idea: put a ball on the cost function surface](https://reader033.vdocuments.site/reader033/viewer/2022041718/5e4cb69bfac579765149f355/html5/thumbnails/21.jpg)
Dropoutrandomly disable some of the neurons in each backprop step
Hinton et al. [2012]
![Page 22: Deep Learning, Echo State Networks and the Edge …bartak/ui_seminar/talks/2016...gradient descent a method for function minimization idea: put a ball on the cost function surface](https://reader033.vdocuments.site/reader033/viewer/2022041718/5e4cb69bfac579765149f355/html5/thumbnails/22.jpg)
Data Augmentationenlarge the dataset by transforming the data so that they still make sense
horizontal flip, rotation, scale, color inversion, ...
reduces overfitting
![Page 23: Deep Learning, Echo State Networks and the Edge …bartak/ui_seminar/talks/2016...gradient descent a method for function minimization idea: put a ball on the cost function surface](https://reader033.vdocuments.site/reader033/viewer/2022041718/5e4cb69bfac579765149f355/html5/thumbnails/23.jpg)
Outline
![Page 24: Deep Learning, Echo State Networks and the Edge …bartak/ui_seminar/talks/2016...gradient descent a method for function minimization idea: put a ball on the cost function surface](https://reader033.vdocuments.site/reader033/viewer/2022041718/5e4cb69bfac579765149f355/html5/thumbnails/24.jpg)
Supervised Trainingbackprop variants, e.g.,
RMSProp (unpublished, see Hinton Neural Networks on Coursera [2012])Adam (Kingma and Ba [2015])
the derivatives do not have to be calculated manually, your favourite deep learning framework does it for you
![Page 25: Deep Learning, Echo State Networks and the Edge …bartak/ui_seminar/talks/2016...gradient descent a method for function minimization idea: put a ball on the cost function surface](https://reader033.vdocuments.site/reader033/viewer/2022041718/5e4cb69bfac579765149f355/html5/thumbnails/25.jpg)
Neuroevolutionthe networks are usually too large to be evolved purely by neuroevolution
not even a combination of neuroevolution and supervised training is usual
a single training on the GPU usually takes hours or days
![Page 26: Deep Learning, Echo State Networks and the Edge …bartak/ui_seminar/talks/2016...gradient descent a method for function minimization idea: put a ball on the cost function surface](https://reader033.vdocuments.site/reader033/viewer/2022041718/5e4cb69bfac579765149f355/html5/thumbnails/26.jpg)
Outline
![Page 27: Deep Learning, Echo State Networks and the Edge …bartak/ui_seminar/talks/2016...gradient descent a method for function minimization idea: put a ball on the cost function surface](https://reader033.vdocuments.site/reader033/viewer/2022041718/5e4cb69bfac579765149f355/html5/thumbnails/27.jpg)
ILSVRC● benchmark of image classification models● ~1.5 million images● 1000 classes
Russakovsky et al. [2015]
ILSVRC top-5 classification
human level
![Page 28: Deep Learning, Echo State Networks and the Edge …bartak/ui_seminar/talks/2016...gradient descent a method for function minimization idea: put a ball on the cost function surface](https://reader033.vdocuments.site/reader033/viewer/2022041718/5e4cb69bfac579765149f355/html5/thumbnails/28.jpg)
● the winner of ILSVRC image classification 2012● 8 layers● 15.3% top-5 error● 60 million parameters, 2 GPUs● the beginning of the neural-network-computer-vision era
AlexNet
Krizhevsky et al. [2012]
![Page 29: Deep Learning, Echo State Networks and the Edge …bartak/ui_seminar/talks/2016...gradient descent a method for function minimization idea: put a ball on the cost function surface](https://reader033.vdocuments.site/reader033/viewer/2022041718/5e4cb69bfac579765149f355/html5/thumbnails/29.jpg)
AlexNet
Krizhevsky et al. [2012]
● One GPU evolved color filters, the other evolved black and white filters
![Page 30: Deep Learning, Echo State Networks and the Edge …bartak/ui_seminar/talks/2016...gradient descent a method for function minimization idea: put a ball on the cost function surface](https://reader033.vdocuments.site/reader033/viewer/2022041718/5e4cb69bfac579765149f355/html5/thumbnails/30.jpg)
AlexNet - Classification Results
Krizhevsky et al. [2012]
![Page 31: Deep Learning, Echo State Networks and the Edge …bartak/ui_seminar/talks/2016...gradient descent a method for function minimization idea: put a ball on the cost function surface](https://reader033.vdocuments.site/reader033/viewer/2022041718/5e4cb69bfac579765149f355/html5/thumbnails/31.jpg)
AlexNet - Image Similarity
Krizhevsky et al. [2012]
Previous best by Zezula et al. [2005]
![Page 32: Deep Learning, Echo State Networks and the Edge …bartak/ui_seminar/talks/2016...gradient descent a method for function minimization idea: put a ball on the cost function surface](https://reader033.vdocuments.site/reader033/viewer/2022041718/5e4cb69bfac579765149f355/html5/thumbnails/32.jpg)
GoogLeNet
Szegedy et al. [2014]
● the winner of ILSVRC image classification 2014● 22 layers● 6.67% top-5 error● Google
![Page 33: Deep Learning, Echo State Networks and the Edge …bartak/ui_seminar/talks/2016...gradient descent a method for function minimization idea: put a ball on the cost function surface](https://reader033.vdocuments.site/reader033/viewer/2022041718/5e4cb69bfac579765149f355/html5/thumbnails/33.jpg)
Residual Networks● the winner of ILSVRC image classification 2015● 152 layers● 4.49% top-5 error● Microsoft research
Kaiming He et al. [2015]
example of a 34-layers residual network
![Page 34: Deep Learning, Echo State Networks and the Edge …bartak/ui_seminar/talks/2016...gradient descent a method for function minimization idea: put a ball on the cost function surface](https://reader033.vdocuments.site/reader033/viewer/2022041718/5e4cb69bfac579765149f355/html5/thumbnails/34.jpg)
Image Captioning
Lebret et al. [2015]
![Page 35: Deep Learning, Echo State Networks and the Edge …bartak/ui_seminar/talks/2016...gradient descent a method for function minimization idea: put a ball on the cost function surface](https://reader033.vdocuments.site/reader033/viewer/2022041718/5e4cb69bfac579765149f355/html5/thumbnails/35.jpg)
Image Captioning
Lebret et al. [2015]
![Page 36: Deep Learning, Echo State Networks and the Edge …bartak/ui_seminar/talks/2016...gradient descent a method for function minimization idea: put a ball on the cost function surface](https://reader033.vdocuments.site/reader033/viewer/2022041718/5e4cb69bfac579765149f355/html5/thumbnails/36.jpg)
Outline
![Page 37: Deep Learning, Echo State Networks and the Edge …bartak/ui_seminar/talks/2016...gradient descent a method for function minimization idea: put a ball on the cost function surface](https://reader033.vdocuments.site/reader033/viewer/2022041718/5e4cb69bfac579765149f355/html5/thumbnails/37.jpg)
Recurrent Networkscan be unfolded through time and reduced to feed-forward networks
⟹ training by backpropagation (through time)
problem: implicitly infinitely deep
⟹ the vanishing gradient is even more significant
partial solution - LSTM, GRU, Echo State Networks
![Page 38: Deep Learning, Echo State Networks and the Edge …bartak/ui_seminar/talks/2016...gradient descent a method for function minimization idea: put a ball on the cost function surface](https://reader033.vdocuments.site/reader033/viewer/2022041718/5e4cb69bfac579765149f355/html5/thumbnails/38.jpg)
Geometrical Representation of Exploding Gradientconsider the dynamical systemxt = wσ(xt−1) + b
Fig. 6 illustrates the error surfaceE50 = (σ(x50) − 0.7)2
Pascanu et al. [2013]
possible solution: limit the gradient norm
![Page 39: Deep Learning, Echo State Networks and the Edge …bartak/ui_seminar/talks/2016...gradient descent a method for function minimization idea: put a ball on the cost function surface](https://reader033.vdocuments.site/reader033/viewer/2022041718/5e4cb69bfac579765149f355/html5/thumbnails/39.jpg)
Outline
![Page 40: Deep Learning, Echo State Networks and the Edge …bartak/ui_seminar/talks/2016...gradient descent a method for function minimization idea: put a ball on the cost function surface](https://reader033.vdocuments.site/reader033/viewer/2022041718/5e4cb69bfac579765149f355/html5/thumbnails/40.jpg)
Long-Short Term Memory (LSTM)avoid vanishing gradient
use a special neuron with a “memory”⟹ able to capture long-term dependencies
Hochreiter and Schmidhuber [1997]
de-facto standard in recurrent neural networks
![Page 41: Deep Learning, Echo State Networks and the Edge …bartak/ui_seminar/talks/2016...gradient descent a method for function minimization idea: put a ball on the cost function surface](https://reader033.vdocuments.site/reader033/viewer/2022041718/5e4cb69bfac579765149f355/html5/thumbnails/41.jpg)
Long-Short Term Memory (LSTM)
Hochreiter and Schmidhuber [1997]
![Page 42: Deep Learning, Echo State Networks and the Edge …bartak/ui_seminar/talks/2016...gradient descent a method for function minimization idea: put a ball on the cost function surface](https://reader033.vdocuments.site/reader033/viewer/2022041718/5e4cb69bfac579765149f355/html5/thumbnails/42.jpg)
Gated Recurrent Unit (GRU)slightly “simplified” version of LSTM
Cho et al. [2014], comparison with LSTM by Chung et al. [2014]
performance is comparable with LSTM
![Page 43: Deep Learning, Echo State Networks and the Edge …bartak/ui_seminar/talks/2016...gradient descent a method for function minimization idea: put a ball on the cost function surface](https://reader033.vdocuments.site/reader033/viewer/2022041718/5e4cb69bfac579765149f355/html5/thumbnails/43.jpg)
Outline
![Page 44: Deep Learning, Echo State Networks and the Edge …bartak/ui_seminar/talks/2016...gradient descent a method for function minimization idea: put a ball on the cost function surface](https://reader033.vdocuments.site/reader033/viewer/2022041718/5e4cb69bfac579765149f355/html5/thumbnails/44.jpg)
Handwriting Recognition
Graves et al. [2009]
![Page 45: Deep Learning, Echo State Networks and the Edge …bartak/ui_seminar/talks/2016...gradient descent a method for function minimization idea: put a ball on the cost function surface](https://reader033.vdocuments.site/reader033/viewer/2022041718/5e4cb69bfac579765149f355/html5/thumbnails/45.jpg)
Handwriting Generation
http://www.cs.toronto.edu/~graves/handwriting.html, Graves [2014]
![Page 46: Deep Learning, Echo State Networks and the Edge …bartak/ui_seminar/talks/2016...gradient descent a method for function minimization idea: put a ball on the cost function surface](https://reader033.vdocuments.site/reader033/viewer/2022041718/5e4cb69bfac579765149f355/html5/thumbnails/46.jpg)
Speech Recognition
https://googleblog.blogspot.cz/2015/07/neon-prescription-or-rather-new.html Graves et al. [2013]
“Using a LSTM, we cut our transcription errors by 49%.” -- Google Voice Blogspot [2015]
![Page 47: Deep Learning, Echo State Networks and the Edge …bartak/ui_seminar/talks/2016...gradient descent a method for function minimization idea: put a ball on the cost function surface](https://reader033.vdocuments.site/reader033/viewer/2022041718/5e4cb69bfac579765149f355/html5/thumbnails/47.jpg)
Robot Control
Mayer et al. [2008]
![Page 48: Deep Learning, Echo State Networks and the Edge …bartak/ui_seminar/talks/2016...gradient descent a method for function minimization idea: put a ball on the cost function surface](https://reader033.vdocuments.site/reader033/viewer/2022041718/5e4cb69bfac579765149f355/html5/thumbnails/48.jpg)
And more...machine translation
image caption generation
...
![Page 49: Deep Learning, Echo State Networks and the Edge …bartak/ui_seminar/talks/2016...gradient descent a method for function minimization idea: put a ball on the cost function surface](https://reader033.vdocuments.site/reader033/viewer/2022041718/5e4cb69bfac579765149f355/html5/thumbnails/49.jpg)
Outline
![Page 50: Deep Learning, Echo State Networks and the Edge …bartak/ui_seminar/talks/2016...gradient descent a method for function minimization idea: put a ball on the cost function surface](https://reader033.vdocuments.site/reader033/viewer/2022041718/5e4cb69bfac579765149f355/html5/thumbnails/50.jpg)
Echo State Networkswhat about a totally random network?
Jaeger [2001]
![Page 51: Deep Learning, Echo State Networks and the Edge …bartak/ui_seminar/talks/2016...gradient descent a method for function minimization idea: put a ball on the cost function surface](https://reader033.vdocuments.site/reader033/viewer/2022041718/5e4cb69bfac579765149f355/html5/thumbnails/51.jpg)
Biological MotivationESN’s do not seem to be biologically plausible
improve them by a different topology?
⟹ neuroevolution (HyperNEAT)
![Page 52: Deep Learning, Echo State Networks and the Edge …bartak/ui_seminar/talks/2016...gradient descent a method for function minimization idea: put a ball on the cost function surface](https://reader033.vdocuments.site/reader033/viewer/2022041718/5e4cb69bfac579765149f355/html5/thumbnails/52.jpg)
Neuroevolution5 runs, 2000 generations, 150 individuals
![Page 53: Deep Learning, Echo State Networks and the Edge …bartak/ui_seminar/talks/2016...gradient descent a method for function minimization idea: put a ball on the cost function surface](https://reader033.vdocuments.site/reader033/viewer/2022041718/5e4cb69bfac579765149f355/html5/thumbnails/53.jpg)
the best network has only local connections
what about a locally connectedrandom network?
⟹ fast neuroevolution alternative
Neuroevolution
![Page 54: Deep Learning, Echo State Networks and the Edge …bartak/ui_seminar/talks/2016...gradient descent a method for function minimization idea: put a ball on the cost function surface](https://reader033.vdocuments.site/reader033/viewer/2022041718/5e4cb69bfac579765149f355/html5/thumbnails/54.jpg)
Dynamic Systemsrecurrent networks are dynamic systems
⟹ we can measure the amount of chaos (Lyapunov exponent)
Sprott [2015]
![Page 55: Deep Learning, Echo State Networks and the Edge …bartak/ui_seminar/talks/2016...gradient descent a method for function minimization idea: put a ball on the cost function surface](https://reader033.vdocuments.site/reader033/viewer/2022041718/5e4cb69bfac579765149f355/html5/thumbnails/55.jpg)
Chaos
![Page 56: Deep Learning, Echo State Networks and the Edge …bartak/ui_seminar/talks/2016...gradient descent a method for function minimization idea: put a ball on the cost function surface](https://reader033.vdocuments.site/reader033/viewer/2022041718/5e4cb69bfac579765149f355/html5/thumbnails/56.jpg)
recurrent networks have multiple levels of chaos
relation between dynamics and performance
Chaos
the edge of chaos
Bertschinger and Natschlager [2004], Boedecker et al. [2012]
![Page 57: Deep Learning, Echo State Networks and the Edge …bartak/ui_seminar/talks/2016...gradient descent a method for function minimization idea: put a ball on the cost function surface](https://reader033.vdocuments.site/reader033/viewer/2022041718/5e4cb69bfac579765149f355/html5/thumbnails/57.jpg)
Chaosneuroevolution prefers to be close to the edge of chaos
![Page 58: Deep Learning, Echo State Networks and the Edge …bartak/ui_seminar/talks/2016...gradient descent a method for function minimization idea: put a ball on the cost function surface](https://reader033.vdocuments.site/reader033/viewer/2022041718/5e4cb69bfac579765149f355/html5/thumbnails/58.jpg)
Conclusionneural networks represent a very strong artificial intelligence model
there has been a huge step forward in the last decade
recurrent networks still have a long way to go
neural networks are good in the same things as humans
neural networks are bad in the same things as humans
![Page 59: Deep Learning, Echo State Networks and the Edge …bartak/ui_seminar/talks/2016...gradient descent a method for function minimization idea: put a ball on the cost function surface](https://reader033.vdocuments.site/reader033/viewer/2022041718/5e4cb69bfac579765149f355/html5/thumbnails/59.jpg)
Questions?
![Page 60: Deep Learning, Echo State Networks and the Edge …bartak/ui_seminar/talks/2016...gradient descent a method for function minimization idea: put a ball on the cost function surface](https://reader033.vdocuments.site/reader033/viewer/2022041718/5e4cb69bfac579765149f355/html5/thumbnails/60.jpg)
Recommended Reading1) Neural Networks and Deep Learning - Michael NielsenA well written online book covering the basic topics of feed-forward neural networks. Freely available.http://neuralnetworksanddeeplearning.com/
2) ImageNet Classification with Deep Convolutional Neural Networks - Krizhevsky et al. [2012].One of the first great successes of deep convolutional networks. Short and clear paper, however, it assumes the knowledge of backprop, max-pooling, dropout, etc.http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf
3) Deep Learning - Ian Goodfellow and Yoshua Bengio and Aaron Courville [in preparation, 2016]A book in preparation, which is available on-line. It is written by a few of the best deep learning scientists and describes most of the up-to-date techniques.http://www.deeplearningbook.org/
4) Deep Learning in Neural Networks: An Overview - Schmidhuber et al. [2014]A huge survey of neural networks.http://arxiv.org/pdf/1404.7828v4