learning with neural networksxiaowei/ai_materials... · network: loss function: example •training...

43
Learning with neural networks Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/

Upload: others

Post on 16-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Learning with neural networksxiaowei/ai_materials... · Network: Loss function: Example •Training data (a single sample) •given inputs 0.05 and 0.10, we want the neural network

Learning with neural networks

Dr. Xiaowei Huang

https://cgi.csc.liv.ac.uk/~xiaowei/

Page 2: Learning with neural networksxiaowei/ai_materials... · Network: Loss function: Example •Training data (a single sample) •given inputs 0.05 and 0.10, we want the neural network

Up to now,

• Traditional Machine Learning Algorithms

• Deep learning • Introduction to Deep Learning

• Functional view and features

Page 3: Learning with neural networksxiaowei/ai_materials... · Network: Loss function: Example •Training data (a single sample) •given inputs 0.05 and 0.10, we want the neural network

Topics

• Forward and backward computation

• Back-propogation and chain rule

• Regularization

Page 4: Learning with neural networksxiaowei/ai_materials... · Network: Loss function: Example •Training data (a single sample) •given inputs 0.05 and 0.10, we want the neural network

Initialisation

• Collect annotated data

• Define model and initialize randomly

• Predict based on current model• In neural network jargon “forward propagation”

• Evaluate predictions and update model weights

Page 5: Learning with neural networksxiaowei/ai_materials... · Network: Loss function: Example •Training data (a single sample) •given inputs 0.05 and 0.10, we want the neural network

Forward computations

• Collect annotated data

• Define model and initialize randomly

• Predict based on current model• In neural network jargon “forward propagation”

• Evaluate predictions and update model weights

Page 6: Learning with neural networksxiaowei/ai_materials... · Network: Loss function: Example •Training data (a single sample) •given inputs 0.05 and 0.10, we want the neural network

Forward computations

• Collect annotated data

• Define model and initialize randomly

• Predict based on current model• In neural network jargon “forward propagation”

• Evaluate predictions and update model weights

Page 7: Learning with neural networksxiaowei/ai_materials... · Network: Loss function: Example •Training data (a single sample) •given inputs 0.05 and 0.10, we want the neural network

Forward computations

• Collect annotated data

• Define model and initialize randomly

• Predict based on current model• In neural network jargon “forward propagation”

• Evaluate predictions and update model weights

Page 8: Learning with neural networksxiaowei/ai_materials... · Network: Loss function: Example •Training data (a single sample) •given inputs 0.05 and 0.10, we want the neural network

Backward computations

• Collect gradient data

• Define model and initialize randomly

• Predict based on current model• In neural network jargon “backpropagation”

• Evaluate predictions and update model weights

Page 9: Learning with neural networksxiaowei/ai_materials... · Network: Loss function: Example •Training data (a single sample) •given inputs 0.05 and 0.10, we want the neural network

Backward computations

• Collect gradient data

• Define model and initialize randomly

• Predict based on current model• In neural network jargon “backpropagation”

• Evaluate predictions and update model weights

Page 10: Learning with neural networksxiaowei/ai_materials... · Network: Loss function: Example •Training data (a single sample) •given inputs 0.05 and 0.10, we want the neural network

Backward computations

• Collect gradient data

• Define model and initialize randomly

• Predict based on current model• In neural network jargon “backpropagation”

• Evaluate predictions and update model weights

Page 11: Learning with neural networksxiaowei/ai_materials... · Network: Loss function: Example •Training data (a single sample) •given inputs 0.05 and 0.10, we want the neural network

Backward computations

• Collect gradient data

• Define model and initialize randomly

• Predict based on current model• In neural network jargon “backpropagation”

• Evaluate predictions and update model weights

Update weight

Page 12: Learning with neural networksxiaowei/ai_materials... · Network: Loss function: Example •Training data (a single sample) •given inputs 0.05 and 0.10, we want the neural network

Backward computations

• Collect gradient data

• Define model and initialize randomly

• Predict based on current model• In neural network jargon “backpropagation”

• Evaluate predictions and update model weights

Page 13: Learning with neural networksxiaowei/ai_materials... · Network: Loss function: Example •Training data (a single sample) •given inputs 0.05 and 0.10, we want the neural network

Recall: Training Objective

Given training corpus {𝑋, 𝑌} find optimal parameters

Loss function

Prediction

Ground truth

accumulated loss

Find an optimal model parameterised over

Page 14: Learning with neural networksxiaowei/ai_materials... · Network: Loss function: Example •Training data (a single sample) •given inputs 0.05 and 0.10, we want the neural network

Forward Computation

Page 15: Learning with neural networksxiaowei/ai_materials... · Network: Loss function: Example •Training data (a single sample) •given inputs 0.05 and 0.10, we want the neural network

Example

Network:

Loss function:

Page 16: Learning with neural networksxiaowei/ai_materials... · Network: Loss function: Example •Training data (a single sample) •given inputs 0.05 and 0.10, we want the neural network

Example

• Training data (a single sample)• given inputs 0.05 and 0.10, we want

the neural network to output

0.01 and 0.99.

Page 17: Learning with neural networksxiaowei/ai_materials... · Network: Loss function: Example •Training data (a single sample) •given inputs 0.05 and 0.10, we want the neural network

Example

• Training data (a single sample)• given inputs 0.05 and 0.10, we want

the neural network to output

0.01 and 0.99.

Page 18: Learning with neural networksxiaowei/ai_materials... · Network: Loss function: Example •Training data (a single sample) •given inputs 0.05 and 0.10, we want the neural network

Example

• Training data (a single sample)• given inputs 0.05 and 0.10, we want

the neural network to output

0.01 and 0.99.

Page 19: Learning with neural networksxiaowei/ai_materials... · Network: Loss function: Example •Training data (a single sample) •given inputs 0.05 and 0.10, we want the neural network

Backward Propogation

Page 20: Learning with neural networksxiaowei/ai_materials... · Network: Loss function: Example •Training data (a single sample) •given inputs 0.05 and 0.10, we want the neural network

Recall: Minimizing with multiple dimensional inputs • We often minimize functions with multiple-dimensional inputs

• For minimization to make sense there must still be only one (scalar) output

Page 21: Learning with neural networksxiaowei/ai_materials... · Network: Loss function: Example •Training data (a single sample) •given inputs 0.05 and 0.10, we want the neural network

Functions with multiple inputs

• Partial derivatives

measures how f changes as only variable xi increases at point x

• Gradient generalizes notion of derivative where derivative is wrt a vector

• Gradient is vector containing all of the partial derivatives denoted

Note: In the training objective case,

f is the loss

the parameter x is

Page 22: Learning with neural networksxiaowei/ai_materials... · Network: Loss function: Example •Training data (a single sample) •given inputs 0.05 and 0.10, we want the neural network

Optimization through Gradient Descent

• As with many model, we optimize our neural network with Gradient Descent

• The most important component in this formulation is the gradient

• Backpropagation to the rescue• The backward computations of network return the gradients

• How to make the backward computations

Page 23: Learning with neural networksxiaowei/ai_materials... · Network: Loss function: Example •Training data (a single sample) •given inputs 0.05 and 0.10, we want the neural network

Weight Update

where

Page 24: Learning with neural networksxiaowei/ai_materials... · Network: Loss function: Example •Training data (a single sample) •given inputs 0.05 and 0.10, we want the neural network

Example

Page 25: Learning with neural networksxiaowei/ai_materials... · Network: Loss function: Example •Training data (a single sample) •given inputs 0.05 and 0.10, we want the neural network

Example

1.

2.

3.

Page 26: Learning with neural networksxiaowei/ai_materials... · Network: Loss function: Example •Training data (a single sample) •given inputs 0.05 and 0.10, we want the neural network

Example

Page 27: Learning with neural networksxiaowei/ai_materials... · Network: Loss function: Example •Training data (a single sample) •given inputs 0.05 and 0.10, we want the neural network

Example

Page 28: Learning with neural networksxiaowei/ai_materials... · Network: Loss function: Example •Training data (a single sample) •given inputs 0.05 and 0.10, we want the neural network

Chain rule

• Assume a nested function, 𝑧 = 𝑓(𝑦) and y= g(x)

• Chain Rule for scalars 𝑥, 𝑦, 𝑧

• When 𝑥∈ R𝑚,𝑦∈ R𝑛, 𝑧∈R

• i.e., gradients from all possible paths

Page 29: Learning with neural networksxiaowei/ai_materials... · Network: Loss function: Example •Training data (a single sample) •given inputs 0.05 and 0.10, we want the neural network

Chain rule

• Assume a nested function, 𝑧 = 𝑓(𝑦) and y= g(x)

• Chain Rule for scalars 𝑥, 𝑦, 𝑧

• When 𝑥∈ R𝑚,𝑦∈ R𝑛, 𝑧∈R

• i.e., gradients from all possible paths

Page 30: Learning with neural networksxiaowei/ai_materials... · Network: Loss function: Example •Training data (a single sample) •given inputs 0.05 and 0.10, we want the neural network

Chain rule

• Assume a nested function, 𝑧 = 𝑓(𝑦) and y= g(x)

• Chain Rule for scalars 𝑥, 𝑦, 𝑧

• When 𝑥∈ R𝑚,𝑦∈ R𝑛,𝑧∈R

• i.e., gradients from all possible paths

Page 31: Learning with neural networksxiaowei/ai_materials... · Network: Loss function: Example •Training data (a single sample) •given inputs 0.05 and 0.10, we want the neural network

Chain rule

• Assume a nested function, 𝑧 = 𝑓(𝑦) and y= g(x)

• Chain Rule for scalars 𝑥, 𝑦, 𝑧

• When𝑥∈ R𝑚,𝑦∈ R𝑛,𝑧∈R

• i.e., gradients from all possible paths

Page 32: Learning with neural networksxiaowei/ai_materials... · Network: Loss function: Example •Training data (a single sample) •given inputs 0.05 and 0.10, we want the neural network

Chain rule

• Assume a nested function, 𝑧 = 𝑓(𝑦) and y= g(x)

• Chain Rule for scalars 𝑥, 𝑦, 𝑧 :

• When 𝑥∈ R𝑚,𝑦∈ R𝑛,𝑧∈R

• i.e., gradients from all possible paths • or in vector notation

• is the Jacobian

Page 33: Learning with neural networksxiaowei/ai_materials... · Network: Loss function: Example •Training data (a single sample) •given inputs 0.05 and 0.10, we want the neural network

The Jacobian

• When 𝑥∈ R3, 𝑦∈ R2

Page 34: Learning with neural networksxiaowei/ai_materials... · Network: Loss function: Example •Training data (a single sample) •given inputs 0.05 and 0.10, we want the neural network

Chain rule in practice

• f(y) =sin(𝑦) ,𝑦=𝑔(𝑥) =0.5𝑥2

Page 35: Learning with neural networksxiaowei/ai_materials... · Network: Loss function: Example •Training data (a single sample) •given inputs 0.05 and 0.10, we want the neural network

Backpropagation

Page 36: Learning with neural networksxiaowei/ai_materials... · Network: Loss function: Example •Training data (a single sample) •given inputs 0.05 and 0.10, we want the neural network

General Workflow

Page 37: Learning with neural networksxiaowei/ai_materials... · Network: Loss function: Example •Training data (a single sample) •given inputs 0.05 and 0.10, we want the neural network

Regularization as Constraints

Page 38: Learning with neural networksxiaowei/ai_materials... · Network: Loss function: Example •Training data (a single sample) •given inputs 0.05 and 0.10, we want the neural network

Recall: what is regularization?

• In general: any method to prevent overfitting or help the optimization

• Specifically: additional terms in the training optimization objective to prevent overfitting or help the optimization

Page 39: Learning with neural networksxiaowei/ai_materials... · Network: Loss function: Example •Training data (a single sample) •given inputs 0.05 and 0.10, we want the neural network

Recall: Overfitting

• Key: empirical loss and expected loss are different

• Smaller the data set, larger the difference between the two

• Larger the hypothesis class, easier to find a hypothesis that fits the difference between the two • Thus has small training error but large test error (overfitting)

• Larger data set helps

• Throwing away useless hypotheses also helps (regularization)

Page 40: Learning with neural networksxiaowei/ai_materials... · Network: Loss function: Example •Training data (a single sample) •given inputs 0.05 and 0.10, we want the neural network

Regularization as hard constraint

• Training objective

• When parametrized

Adding constraints

Page 41: Learning with neural networksxiaowei/ai_materials... · Network: Loss function: Example •Training data (a single sample) •given inputs 0.05 and 0.10, we want the neural network

Regularization as hard constraint

• When 𝛺 measured by some quantity 𝑅

• Example: 𝑙2 regularization Adding constraints

Page 42: Learning with neural networksxiaowei/ai_materials... · Network: Loss function: Example •Training data (a single sample) •given inputs 0.05 and 0.10, we want the neural network

Regularization as soft constraint

• The hard-constraint optimization is equivalent to soft-constraint

for some regularization parameter 𝜆∗ > 0

• Example: 𝑙2 regularization

Adding constraints

Page 43: Learning with neural networksxiaowei/ai_materials... · Network: Loss function: Example •Training data (a single sample) •given inputs 0.05 and 0.10, we want the neural network

Regularization as soft constraint

• Showed by Lagrangian multiplier method

• Suppose 𝜃∗ is the optimal for hard-constraint optimization

• Suppose 𝜆∗ is the corresponding optimal for max

Adding constraints