neural networks workshop · neural networks arti cial neuron: the building blocks of a neural...

34
Neural Networks Workshop Tony Allen Department of Mathematics Purdue University July 1, 2019 Tony Allen NN Workshop July 1, 2019 1 / 30

Upload: others

Post on 27-Mar-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Neural Networks Workshop · Neural Networks Arti cial Neuron: The building blocks of a neural network Mathematically, y= f(w 1x 1 +w 2x 2 +w 3x 3 +b) = f(wTx+b) Diagram from Nick

Neural NetworksWorkshop

Tony Allen

Department of MathematicsPurdue University

July 1, 2019

Tony Allen NN Workshop July 1, 2019 1 / 30

Page 2: Neural Networks Workshop · Neural Networks Arti cial Neuron: The building blocks of a neural network Mathematically, y= f(w 1x 1 +w 2x 2 +w 3x 3 +b) = f(wTx+b) Diagram from Nick

Helpful Resources

I Goodfellow, Bengio, Courville’s Deep Learninghttps://www.deeplearningbook.org/

I Francis Chollet’s Deep Learning with Python https://github.com/

fchollet/deep-learning-with-python-notebooks

I Dr. Buzzard MA598 Course Noteshttps://www.math.purdue.edu/~buzzard/MA598-Spring2019/

I Nick Winovich’s SIAM@Purdue TensorFlow Workshophttps://www.math.purdue.edu/~nwinovic/workshop.html

Tony Allen NN Workshop July 1, 2019 2 / 30

Page 3: Neural Networks Workshop · Neural Networks Arti cial Neuron: The building blocks of a neural network Mathematically, y= f(w 1x 1 +w 2x 2 +w 3x 3 +b) = f(wTx+b) Diagram from Nick

What is Machine Learning?

Machine Learning shifts the paradigm from programming for answers toprogramming to discover rules.

Diagram adapted from Francis Chollet’s Deep Learning with PythonTony Allen NN Workshop July 1, 2019 3 / 30

Page 4: Neural Networks Workshop · Neural Networks Arti cial Neuron: The building blocks of a neural network Mathematically, y= f(w 1x 1 +w 2x 2 +w 3x 3 +b) = f(wTx+b) Diagram from Nick

Neural Networks

Artificial Neuron: The building blocks of a neural network

Mathematically, y = f(w1x1 + w2x2 + w3x3 + b)

= f(wTx+ b)

Diagram from Nick WinovichTony Allen NN Workshop July 1, 2019 4 / 30

Page 5: Neural Networks Workshop · Neural Networks Arti cial Neuron: The building blocks of a neural network Mathematically, y= f(w 1x 1 +w 2x 2 +w 3x 3 +b) = f(wTx+b) Diagram from Nick

Neural Networks

Artificial Neuron: The building blocks of a neural network

Mathematically, y = f(w1x1 + w2x2 + w3x3 + b)

= f(wTx+ b)

Diagram from Nick WinovichTony Allen NN Workshop July 1, 2019 4 / 30

Page 6: Neural Networks Workshop · Neural Networks Arti cial Neuron: The building blocks of a neural network Mathematically, y= f(w 1x 1 +w 2x 2 +w 3x 3 +b) = f(wTx+b) Diagram from Nick

Neural Networks

Tony Allen NN Workshop July 1, 2019 5 / 30

Page 7: Neural Networks Workshop · Neural Networks Arti cial Neuron: The building blocks of a neural network Mathematically, y= f(w 1x 1 +w 2x 2 +w 3x 3 +b) = f(wTx+b) Diagram from Nick

Neural Networks

We can combine the corresponding equations

y1 = f(w1Tx+ b1)

y2 = f(w2Tx+ b2)

into one matrix-vector product equation

y = f(Wx+ b)

If we have N inputs and M outputs, then W is a Dense M ×N matrix.

(for the picky: f is applied element-wise)Tony Allen NN Workshop July 1, 2019 6 / 30

Page 8: Neural Networks Workshop · Neural Networks Arti cial Neuron: The building blocks of a neural network Mathematically, y= f(w 1x 1 +w 2x 2 +w 3x 3 +b) = f(wTx+b) Diagram from Nick

Dense (Fully Connected) Layer

Diagram from Nick WinovichTony Allen NN Workshop July 1, 2019 7 / 30

Page 9: Neural Networks Workshop · Neural Networks Arti cial Neuron: The building blocks of a neural network Mathematically, y= f(w 1x 1 +w 2x 2 +w 3x 3 +b) = f(wTx+b) Diagram from Nick

Network Depth

y = f2(W2(f1(W1x+ b1)) + b2)

Diagram from Nick WinovichTony Allen NN Workshop July 1, 2019 8 / 30

Page 10: Neural Networks Workshop · Neural Networks Arti cial Neuron: The building blocks of a neural network Mathematically, y= f(w 1x 1 +w 2x 2 +w 3x 3 +b) = f(wTx+b) Diagram from Nick

Network Depth

y = f3(W3(f2(W2(f1(W1x+ b1)) + b2)) + b3)

Diagram from Nick WinovichTony Allen NN Workshop July 1, 2019 9 / 30

Page 11: Neural Networks Workshop · Neural Networks Arti cial Neuron: The building blocks of a neural network Mathematically, y= f(w 1x 1 +w 2x 2 +w 3x 3 +b) = f(wTx+b) Diagram from Nick

A Word on Activation Functions

Activation functions are a fundamental component of networkarchitecture; they allow for non-linear modeling capacity, and control thegradient flow that guide training.

Figures from Nick Winovich

Page 12: Neural Networks Workshop · Neural Networks Arti cial Neuron: The building blocks of a neural network Mathematically, y= f(w 1x 1 +w 2x 2 +w 3x 3 +b) = f(wTx+b) Diagram from Nick

Optimization (How to learn)

Goal: Learn weights so the network gives desired output

Everything today will be Supervised Learning:

x Neural Net y

loss(y, y)Update weights

Adjust weights wi,j to minimize the loss

Tony Allen NN Workshop July 1, 2019 11 / 30

Page 13: Neural Networks Workshop · Neural Networks Arti cial Neuron: The building blocks of a neural network Mathematically, y= f(w 1x 1 +w 2x 2 +w 3x 3 +b) = f(wTx+b) Diagram from Nick

Gradient Descent

From calculus: The greatest decrease in a function is in the directionopposite of the gradient.

Let θ be all the parameters (weights and biases) and E be total loss overall data. Then iteratively apply a method called Gradient Descent:

θk+1 = θk − αk∇Eθk

However, computing gradient of loss over all data can be expensive. Soinstead compute it over random subsets of data (batches). This leads toStochastic Gradient Descent algorithms.

Tony Allen NN Workshop July 1, 2019 12 / 30

Page 14: Neural Networks Workshop · Neural Networks Arti cial Neuron: The building blocks of a neural network Mathematically, y= f(w 1x 1 +w 2x 2 +w 3x 3 +b) = f(wTx+b) Diagram from Nick

Gradient Descent

From calculus: The greatest decrease in a function is in the directionopposite of the gradient.

Let θ be all the parameters (weights and biases) and E be total loss overall data. Then iteratively apply a method called Gradient Descent:

θk+1 = θk − αk∇Eθk

However, computing gradient of loss over all data can be expensive. Soinstead compute it over random subsets of data (batches). This leads toStochastic Gradient Descent algorithms.

Tony Allen NN Workshop July 1, 2019 12 / 30

Page 15: Neural Networks Workshop · Neural Networks Arti cial Neuron: The building blocks of a neural network Mathematically, y= f(w 1x 1 +w 2x 2 +w 3x 3 +b) = f(wTx+b) Diagram from Nick

Gradient Descent

From calculus: The greatest decrease in a function is in the directionopposite of the gradient.

Let θ be all the parameters (weights and biases) and E be total loss overall data. Then iteratively apply a method called Gradient Descent:

θk+1 = θk − αk∇Eθk

However, computing gradient of loss over all data can be expensive. Soinstead compute it over random subsets of data (batches). This leads toStochastic Gradient Descent algorithms.

Tony Allen NN Workshop July 1, 2019 12 / 30

Page 16: Neural Networks Workshop · Neural Networks Arti cial Neuron: The building blocks of a neural network Mathematically, y= f(w 1x 1 +w 2x 2 +w 3x 3 +b) = f(wTx+b) Diagram from Nick

Back Propagation

How do we compute the gradient, i.e.∂E

∂wij?

The answer: Chain Rule!

Tony Allen NN Workshop July 1, 2019 13 / 30

Page 17: Neural Networks Workshop · Neural Networks Arti cial Neuron: The building blocks of a neural network Mathematically, y= f(w 1x 1 +w 2x 2 +w 3x 3 +b) = f(wTx+b) Diagram from Nick

Back Propagation

Diagram from Nick WinovichTony Allen NN Workshop July 1, 2019 14 / 30

Page 18: Neural Networks Workshop · Neural Networks Arti cial Neuron: The building blocks of a neural network Mathematically, y= f(w 1x 1 +w 2x 2 +w 3x 3 +b) = f(wTx+b) Diagram from Nick

Back Propagation

Diagram from Nick WinovichTony Allen NN Workshop July 1, 2019 15 / 30

Page 19: Neural Networks Workshop · Neural Networks Arti cial Neuron: The building blocks of a neural network Mathematically, y= f(w 1x 1 +w 2x 2 +w 3x 3 +b) = f(wTx+b) Diagram from Nick

Back Propagation

Diagram from Nick WinovichTony Allen NN Workshop July 1, 2019 16 / 30

Page 20: Neural Networks Workshop · Neural Networks Arti cial Neuron: The building blocks of a neural network Mathematically, y= f(w 1x 1 +w 2x 2 +w 3x 3 +b) = f(wTx+b) Diagram from Nick

Back Propagation

Diagram from Nick WinovichTony Allen NN Workshop July 1, 2019 17 / 30

Page 21: Neural Networks Workshop · Neural Networks Arti cial Neuron: The building blocks of a neural network Mathematically, y= f(w 1x 1 +w 2x 2 +w 3x 3 +b) = f(wTx+b) Diagram from Nick

Let’s actually do something! (Exercise 1)

The Notebook to follow along can be found on the Workshop homepage:https:

//engineering.purdue.edu/ChanGroup/MLworkshop2019.html

Tony Allen NN Workshop July 1, 2019 18 / 30

Page 22: Neural Networks Workshop · Neural Networks Arti cial Neuron: The building blocks of a neural network Mathematically, y= f(w 1x 1 +w 2x 2 +w 3x 3 +b) = f(wTx+b) Diagram from Nick

Overfitting

In some cases, a network can learn too much. That is, it can learn toperform well on the training data, but fail to generalize to testing data.Solutions include Regularization and Dropout.

Tony Allen NN Workshop July 1, 2019 19 / 30

Page 23: Neural Networks Workshop · Neural Networks Arti cial Neuron: The building blocks of a neural network Mathematically, y= f(w 1x 1 +w 2x 2 +w 3x 3 +b) = f(wTx+b) Diagram from Nick

Regularization and Dropout

Regularization adds a penalty for large weights to the loss function.Commonly, we use

I L1 norm, which encourages sparsity

I L2 norm, which encourages small weights

loss = loss + λ‖θ‖1 (or ‖θ‖2)

Dropout temporarily ignoring random nodes (with fixed probability p)during each training iteration. Ensures no individual node dominates.

Tony Allen NN Workshop July 1, 2019 20 / 30

Page 24: Neural Networks Workshop · Neural Networks Arti cial Neuron: The building blocks of a neural network Mathematically, y= f(w 1x 1 +w 2x 2 +w 3x 3 +b) = f(wTx+b) Diagram from Nick

Exercise 2

In this exercise, we will try to improve the previous model by addingdropout. You should use the Keras Documentation(https://keras.io/) to create a network with at least 2 hidden layersthat use dropout. Plot the training loss and print the test accuracy, andcompare to the previous model.

Tony Allen NN Workshop July 1, 2019 21 / 30

Page 25: Neural Networks Workshop · Neural Networks Arti cial Neuron: The building blocks of a neural network Mathematically, y= f(w 1x 1 +w 2x 2 +w 3x 3 +b) = f(wTx+b) Diagram from Nick

Convolutional Neural Networks (CNNs)

CNNs are useful when the data is spatially structured (e.g. images).

The key concept behind CNNs is that of kernels/filters. These are used inhand-craft feature detection.

What are good, distinguishing features? How do we mathematicallyextract such features?

https://towardsdatascience.com/intuitively-understanding-convolutions-for-deep-learning-1f6f42faee1

Tony Allen NN Workshop July 1, 2019 22 / 30

Page 26: Neural Networks Workshop · Neural Networks Arti cial Neuron: The building blocks of a neural network Mathematically, y= f(w 1x 1 +w 2x 2 +w 3x 3 +b) = f(wTx+b) Diagram from Nick

Convolution

https://github.com/PetarV-/TikZ/tree/masterTony Allen NN Workshop July 1, 2019 23 / 30

Page 27: Neural Networks Workshop · Neural Networks Arti cial Neuron: The building blocks of a neural network Mathematically, y= f(w 1x 1 +w 2x 2 +w 3x 3 +b) = f(wTx+b) Diagram from Nick

Pop Quiz

Let I =

1 2 0 1 32 0 1 4 07 0 9 5 58 5 2 6 08 0 0 1 4

and K =

1 1 00 0 00 0 2

.

1 What is (I ∗K)(1, 1)?

2 What is the size of I ∗K?

Page 28: Neural Networks Workshop · Neural Networks Arti cial Neuron: The building blocks of a neural network Mathematically, y= f(w 1x 1 +w 2x 2 +w 3x 3 +b) = f(wTx+b) Diagram from Nick

Matrix View

In practice, we perform a convolution as one large matrix-vector productthat does all the work in one go.

I =

1 2 0 1 32 0 1 4 07 0 9 5 58 5 2 6 08 0 0 1 4

and K =

1 1 00 0 00 0 2

.

I ∗K =

1 1 0 · · · 2 0 0 · · · 00 1 1 · · · 0 2 0 · · · 00 0 1 · · · 0 0 2 · · · 0

......

0 0 0 · · · 1 1 0 · · · 2

120...9...14

=

211211...22

Words: Toeplitz, Sparse

Page 29: Neural Networks Workshop · Neural Networks Arti cial Neuron: The building blocks of a neural network Mathematically, y= f(w 1x 1 +w 2x 2 +w 3x 3 +b) = f(wTx+b) Diagram from Nick

Convolutional Neural Network

Key Ideas of a CNN:

1 Instead of expensive dense matrix-vector products, do convolutions

2 Everything else stays the same (activation functions, training, etc.)

CNNs scale very well to large images because of their sparse connectionsand natural space invariance.

Of course I have skipped some details, so let me touch on those:

I Stride

I Padding

I Pooling

Tony Allen NN Workshop July 1, 2019 26 / 30

Page 30: Neural Networks Workshop · Neural Networks Arti cial Neuron: The building blocks of a neural network Mathematically, y= f(w 1x 1 +w 2x 2 +w 3x 3 +b) = f(wTx+b) Diagram from Nick

Convolutional Neural Network

Key Ideas of a CNN:

1 Instead of expensive dense matrix-vector products, do convolutions

2 Everything else stays the same (activation functions, training, etc.)

CNNs scale very well to large images because of their sparse connectionsand natural space invariance.

Of course I have skipped some details, so let me touch on those:

I Stride

I Padding

I Pooling

Tony Allen NN Workshop July 1, 2019 26 / 30

Page 31: Neural Networks Workshop · Neural Networks Arti cial Neuron: The building blocks of a neural network Mathematically, y= f(w 1x 1 +w 2x 2 +w 3x 3 +b) = f(wTx+b) Diagram from Nick

Stride and Padding

When defining a convolution, we need to specify how fast and to whatextent the kernel slides over the image. This is the stride and padding,respectively. Both of these determine the size of the output.

In Keras,

I “strides = 2”, determines how many pixels the kernel moves at a time(in this case two)

I “padding = same” puts zeros around the image so that the output isthe same size as the input. Called zero-padding.

I “padding = valid” puts zeros in the necessary places so that theconvolution stays valid

Tony Allen NN Workshop July 1, 2019 27 / 30

Page 32: Neural Networks Workshop · Neural Networks Arti cial Neuron: The building blocks of a neural network Mathematically, y= f(w 1x 1 +w 2x 2 +w 3x 3 +b) = f(wTx+b) Diagram from Nick

Max Pooling

Often, we care about the existence of a feature. Max Pooling is one wayto reduce dimensionality while keeping information about the existence ofa feature.

In my experience, you see this applied after a stride = 1 convolution withzero-padding.

https://computersciencewiki.org/index.php/Max-poolingTony Allen NN Workshop July 1, 2019 28 / 30

Page 33: Neural Networks Workshop · Neural Networks Arti cial Neuron: The building blocks of a neural network Mathematically, y= f(w 1x 1 +w 2x 2 +w 3x 3 +b) = f(wTx+b) Diagram from Nick

Exercise 3

In this exercise, we implement a CNN and see how much better itperforms on our image classification task.

Tony Allen NN Workshop July 1, 2019 29 / 30

Page 34: Neural Networks Workshop · Neural Networks Arti cial Neuron: The building blocks of a neural network Mathematically, y= f(w 1x 1 +w 2x 2 +w 3x 3 +b) = f(wTx+b) Diagram from Nick

Fin.

Tony Allen NN Workshop July 1, 2019 30 / 30