neural networks, etc. - cis.uni-muenchen.defraser/mt_2020/09... · interannotator agreement think...
TRANSCRIPT
![Page 2: Neural Networks, etc. - cis.uni-muenchen.defraser/mt_2020/09... · Interannotator Agreement Think about biases: Label: you only learn what’s in the training data Language: skewed](https://reader033.vdocuments.site/reader033/viewer/2022060600/6053b91f750d75248a37282c/html5/thumbnails/2.jpg)
Topics
1.Activation functions
2.Training of neural networks
3.Recurrent networks
4.LSTM
![Page 3: Neural Networks, etc. - cis.uni-muenchen.defraser/mt_2020/09... · Interannotator Agreement Think about biases: Label: you only learn what’s in the training data Language: skewed](https://reader033.vdocuments.site/reader033/viewer/2022060600/6053b91f750d75248a37282c/html5/thumbnails/3.jpg)
![Page 4: Neural Networks, etc. - cis.uni-muenchen.defraser/mt_2020/09... · Interannotator Agreement Think about biases: Label: you only learn what’s in the training data Language: skewed](https://reader033.vdocuments.site/reader033/viewer/2022060600/6053b91f750d75248a37282c/html5/thumbnails/4.jpg)
Training a Neural Network •Neurons
•Loss Function
•Backpropogation
•Pragmatics
![Page 5: Neural Networks, etc. - cis.uni-muenchen.defraser/mt_2020/09... · Interannotator Agreement Think about biases: Label: you only learn what’s in the training data Language: skewed](https://reader033.vdocuments.site/reader033/viewer/2022060600/6053b91f750d75248a37282c/html5/thumbnails/5.jpg)
The Base Level: Neuron
![Page 6: Neural Networks, etc. - cis.uni-muenchen.defraser/mt_2020/09... · Interannotator Agreement Think about biases: Label: you only learn what’s in the training data Language: skewed](https://reader033.vdocuments.site/reader033/viewer/2022060600/6053b91f750d75248a37282c/html5/thumbnails/6.jpg)
Activation FunctionObvious:
Linear: A = cx
But:1.Derivative is constant2.Stacking layers no longer works
So we need nonlinear activation functions.
![Page 7: Neural Networks, etc. - cis.uni-muenchen.defraser/mt_2020/09... · Interannotator Agreement Think about biases: Label: you only learn what’s in the training data Language: skewed](https://reader033.vdocuments.site/reader033/viewer/2022060600/6053b91f750d75248a37282c/html5/thumbnails/7.jpg)
sigmoid() (and softmax)
![Page 8: Neural Networks, etc. - cis.uni-muenchen.defraser/mt_2020/09... · Interannotator Agreement Think about biases: Label: you only learn what’s in the training data Language: skewed](https://reader033.vdocuments.site/reader033/viewer/2022060600/6053b91f750d75248a37282c/html5/thumbnails/8.jpg)
tanh() (zero-centered)
![Page 9: Neural Networks, etc. - cis.uni-muenchen.defraser/mt_2020/09... · Interannotator Agreement Think about biases: Label: you only learn what’s in the training data Language: skewed](https://reader033.vdocuments.site/reader033/viewer/2022060600/6053b91f750d75248a37282c/html5/thumbnails/9.jpg)
Vanishing Gradient
Gradient Descent is used for training Neural Networks
o(x) = fn(fn-1(….f1(x))Multiplying values < 1 across multiple layers causes VANISHING GRADIENT
![Page 10: Neural Networks, etc. - cis.uni-muenchen.defraser/mt_2020/09... · Interannotator Agreement Think about biases: Label: you only learn what’s in the training data Language: skewed](https://reader033.vdocuments.site/reader033/viewer/2022060600/6053b91f750d75248a37282c/html5/thumbnails/10.jpg)
Avoid Vanishing Gradient with ReLu
![Page 11: Neural Networks, etc. - cis.uni-muenchen.defraser/mt_2020/09... · Interannotator Agreement Think about biases: Label: you only learn what’s in the training data Language: skewed](https://reader033.vdocuments.site/reader033/viewer/2022060600/6053b91f750d75248a37282c/html5/thumbnails/11.jpg)
Avoiding “dying neurons”: Leaky ReLU
![Page 12: Neural Networks, etc. - cis.uni-muenchen.defraser/mt_2020/09... · Interannotator Agreement Think about biases: Label: you only learn what’s in the training data Language: skewed](https://reader033.vdocuments.site/reader033/viewer/2022060600/6053b91f750d75248a37282c/html5/thumbnails/12.jpg)
Loss Function
Mean Squared Error:
Cross Entropy Loss:
![Page 13: Neural Networks, etc. - cis.uni-muenchen.defraser/mt_2020/09... · Interannotator Agreement Think about biases: Label: you only learn what’s in the training data Language: skewed](https://reader033.vdocuments.site/reader033/viewer/2022060600/6053b91f750d75248a37282c/html5/thumbnails/13.jpg)
How to use loss? Train your network while loss is decreasing.
Perfect probability = loss of 0.
Loss isn’t very intuitive. Better to use accuracy or other metric.
![Page 14: Neural Networks, etc. - cis.uni-muenchen.defraser/mt_2020/09... · Interannotator Agreement Think about biases: Label: you only learn what’s in the training data Language: skewed](https://reader033.vdocuments.site/reader033/viewer/2022060600/6053b91f750d75248a37282c/html5/thumbnails/14.jpg)
Backpropogation
1. Forward pass2. Error Calculation3. Backwards Pass
![Page 15: Neural Networks, etc. - cis.uni-muenchen.defraser/mt_2020/09... · Interannotator Agreement Think about biases: Label: you only learn what’s in the training data Language: skewed](https://reader033.vdocuments.site/reader033/viewer/2022060600/6053b91f750d75248a37282c/html5/thumbnails/15.jpg)
How are Neural Networks Trained in Practice?Data
● Large!
Tools
● PyTorch● AllenNLP
Hardware
● GPUs (and even TPUs)
![Page 16: Neural Networks, etc. - cis.uni-muenchen.defraser/mt_2020/09... · Interannotator Agreement Think about biases: Label: you only learn what’s in the training data Language: skewed](https://reader033.vdocuments.site/reader033/viewer/2022060600/6053b91f750d75248a37282c/html5/thumbnails/16.jpg)
Neural Network Data1. Neural networks often have thousands of parameters. 2. Law of large numbers avoids data inconsistency.3. Beware of biases.4. On small datasets in my own work, I’ve personally had close results with
logistic regression models.
![Page 17: Neural Networks, etc. - cis.uni-muenchen.defraser/mt_2020/09... · Interannotator Agreement Think about biases: Label: you only learn what’s in the training data Language: skewed](https://reader033.vdocuments.site/reader033/viewer/2022060600/6053b91f750d75248a37282c/html5/thumbnails/17.jpg)
Where do you get data?● Internet ● Books● Crowd-Sourcing● Artificial modifications● Specialized Communities
![Page 18: Neural Networks, etc. - cis.uni-muenchen.defraser/mt_2020/09... · Interannotator Agreement Think about biases: Label: you only learn what’s in the training data Language: skewed](https://reader033.vdocuments.site/reader033/viewer/2022060600/6053b91f750d75248a37282c/html5/thumbnails/18.jpg)
But how do you guarantee quality?
● Interannotator Agreement● Think about biases:
○ Label: you only learn what’s in the training data○ Language: skewed towards popular languages○ Text: text data requires less space than audio/video data and can be older
● Visualization
![Page 19: Neural Networks, etc. - cis.uni-muenchen.defraser/mt_2020/09... · Interannotator Agreement Think about biases: Label: you only learn what’s in the training data Language: skewed](https://reader033.vdocuments.site/reader033/viewer/2022060600/6053b91f750d75248a37282c/html5/thumbnails/19.jpg)
But what about language?
Neural Networks were a big leap in accuracy for VISION. Pixels were high in dimensionality, and difficult to interpret.
Human interpretable
Levels:
1. Character2. Word3. Phrase/Sentence4. Document
![Page 20: Neural Networks, etc. - cis.uni-muenchen.defraser/mt_2020/09... · Interannotator Agreement Think about biases: Label: you only learn what’s in the training data Language: skewed](https://reader033.vdocuments.site/reader033/viewer/2022060600/6053b91f750d75248a37282c/html5/thumbnails/20.jpg)
Context Matters
![Page 21: Neural Networks, etc. - cis.uni-muenchen.defraser/mt_2020/09... · Interannotator Agreement Think about biases: Label: you only learn what’s in the training data Language: skewed](https://reader033.vdocuments.site/reader033/viewer/2022060600/6053b91f750d75248a37282c/html5/thumbnails/21.jpg)
Pragmatics
1. Train/ Development/ Test splits2. Batching3. Random seed4. Reasonable Significant Digits5. Drop out data during training6. Initialization7. Human baselines & common sense8. Monitor training loss
![Page 22: Neural Networks, etc. - cis.uni-muenchen.defraser/mt_2020/09... · Interannotator Agreement Think about biases: Label: you only learn what’s in the training data Language: skewed](https://reader033.vdocuments.site/reader033/viewer/2022060600/6053b91f750d75248a37282c/html5/thumbnails/22.jpg)
Recurrent Neural Networks (RNN)
![Page 23: Neural Networks, etc. - cis.uni-muenchen.defraser/mt_2020/09... · Interannotator Agreement Think about biases: Label: you only learn what’s in the training data Language: skewed](https://reader033.vdocuments.site/reader033/viewer/2022060600/6053b91f750d75248a37282c/html5/thumbnails/23.jpg)
Limitations
W is weight, h is the single hidden vector, Whh is the weight at previous hidden state, Whx is the weight at current input state, tanh is the Activation funtion, that implements a Non-linearity that squashes the activations to the range[-1.1]
● Vanishing Gradient Losing Long Term Information● Computation
![Page 24: Neural Networks, etc. - cis.uni-muenchen.defraser/mt_2020/09... · Interannotator Agreement Think about biases: Label: you only learn what’s in the training data Language: skewed](https://reader033.vdocuments.site/reader033/viewer/2022060600/6053b91f750d75248a37282c/html5/thumbnails/24.jpg)
Add Gates: Long Short-Term Memory (LSTM)
Input: Is this relevant?
Cell State: What to add?
Output: Where to send next?
![Page 25: Neural Networks, etc. - cis.uni-muenchen.defraser/mt_2020/09... · Interannotator Agreement Think about biases: Label: you only learn what’s in the training data Language: skewed](https://reader033.vdocuments.site/reader033/viewer/2022060600/6053b91f750d75248a37282c/html5/thumbnails/25.jpg)
![Page 26: Neural Networks, etc. - cis.uni-muenchen.defraser/mt_2020/09... · Interannotator Agreement Think about biases: Label: you only learn what’s in the training data Language: skewed](https://reader033.vdocuments.site/reader033/viewer/2022060600/6053b91f750d75248a37282c/html5/thumbnails/26.jpg)
What’s the connection to LANGUAGE?
Language is:
Dependent on overall context
Often short-term sequential
![Page 27: Neural Networks, etc. - cis.uni-muenchen.defraser/mt_2020/09... · Interannotator Agreement Think about biases: Label: you only learn what’s in the training data Language: skewed](https://reader033.vdocuments.site/reader033/viewer/2022060600/6053b91f750d75248a37282c/html5/thumbnails/27.jpg)
Other Variation: GRUOptimizing the memory by forgetting leads to a Gradient Recurrent Unit (GRU)
![Page 28: Neural Networks, etc. - cis.uni-muenchen.defraser/mt_2020/09... · Interannotator Agreement Think about biases: Label: you only learn what’s in the training data Language: skewed](https://reader033.vdocuments.site/reader033/viewer/2022060600/6053b91f750d75248a37282c/html5/thumbnails/28.jpg)
Other Variation: DANDropping word order leads to a Deep Averaging Network (DAN)
![Page 29: Neural Networks, etc. - cis.uni-muenchen.defraser/mt_2020/09... · Interannotator Agreement Think about biases: Label: you only learn what’s in the training data Language: skewed](https://reader033.vdocuments.site/reader033/viewer/2022060600/6053b91f750d75248a37282c/html5/thumbnails/29.jpg)
Anecdotal Applications of LSTMs
![Page 30: Neural Networks, etc. - cis.uni-muenchen.defraser/mt_2020/09... · Interannotator Agreement Think about biases: Label: you only learn what’s in the training data Language: skewed](https://reader033.vdocuments.site/reader033/viewer/2022060600/6053b91f750d75248a37282c/html5/thumbnails/30.jpg)
![Page 31: Neural Networks, etc. - cis.uni-muenchen.defraser/mt_2020/09... · Interannotator Agreement Think about biases: Label: you only learn what’s in the training data Language: skewed](https://reader033.vdocuments.site/reader033/viewer/2022060600/6053b91f750d75248a37282c/html5/thumbnails/31.jpg)
![Page 32: Neural Networks, etc. - cis.uni-muenchen.defraser/mt_2020/09... · Interannotator Agreement Think about biases: Label: you only learn what’s in the training data Language: skewed](https://reader033.vdocuments.site/reader033/viewer/2022060600/6053b91f750d75248a37282c/html5/thumbnails/32.jpg)
High Level Questions?
????????????
![Page 33: Neural Networks, etc. - cis.uni-muenchen.defraser/mt_2020/09... · Interannotator Agreement Think about biases: Label: you only learn what’s in the training data Language: skewed](https://reader033.vdocuments.site/reader033/viewer/2022060600/6053b91f750d75248a37282c/html5/thumbnails/33.jpg)
Image citationsNeuron: https://missinglink.ai/guides/neural-network-concepts/7-types-neural-network-activation-functions-right/
https://towardsdatascience.com/power-of-a-single-neuron-perceptron-c418ba445095
NN:https://towardsdatascience.com/everything-you-need-to-know-about-neural-networks-and-backpropagation-machine-learning-made-easy-e5285bc2be3a
Sigmoid: https://en.wikipedia.org/wiki/Sigmoid_function#/media/File:Logistic-curve.svg
Tanh: https://mathworld.wolfram.com/HyperbolicTangent.html
Relu: https://medium.com/@sonish.sivarajkumar/relu-most-popular-activation-function-for-deep-neural-networks-10160af37dda
Leaky Relu: https://medium.com/@himanshuxd/activation-functions-sigmoid-relu-leaky-relu-and-softmax-basics-for-neural-networks-and-deep-8d9c70eed91e
RNNs, LSTMS: https://towardsdatascience.com/understanding-rnn-and-lstm-f7cdf6dfc14e
GRUs: https://towardsdatascience.com/illustrated-guide-to-lstms-and-gru-s-a-step-by-step-explanation-44e9eb85bf21
DAN: https://mlexplained.com/2018/05/11/paper-dissected-deep-unordered-composition-rivals-syntactic-methods-for-text-classification-explained/