neural networks i - mkang.faculty.unlv.edumkang.faculty.unlv.edu/teaching/cs489_689/11.neural...

21
INTRODUCTION TO MACHINE LEARNING NEURAL NETWORKS I Mingon Kang, Ph.D. Department of Computer Science @ UNLV 1

Upload: others

Post on 22-Mar-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: NEURAL NETWORKS I - mkang.faculty.unlv.edumkang.faculty.unlv.edu/teaching/CS489_689/11.Neural Networks I.pdf · Neural Network Called Networks Involve many various functions Associated

INTRODUCTION TO MACHINE LEARNING

NEURAL NETWORKS I

Mingon Kang, Ph.D.

Department of Computer Science @ UNLV

1

Page 2: NEURAL NETWORKS I - mkang.faculty.unlv.edumkang.faculty.unlv.edu/teaching/CS489_689/11.Neural Networks I.pdf · Neural Network Called Networks Involve many various functions Associated

Begin with a single Perceptron

2

Ref: Dr. Manuela Veloso, CMU

Page 3: NEURAL NETWORKS I - mkang.faculty.unlv.edumkang.faculty.unlv.edu/teaching/CS489_689/11.Neural Networks I.pdf · Neural Network Called Networks Involve many various functions Associated

Threshold Functions

3

Ref: Dr. Manuela Veloso, CMU

Page 4: NEURAL NETWORKS I - mkang.faculty.unlv.edumkang.faculty.unlv.edu/teaching/CS489_689/11.Neural Networks I.pdf · Neural Network Called Networks Involve many various functions Associated

Boolean Functions and Perceptron

4

Ref: Dr. Manuela Veloso, CMU

?

Page 5: NEURAL NETWORKS I - mkang.faculty.unlv.edumkang.faculty.unlv.edu/teaching/CS489_689/11.Neural Networks I.pdf · Neural Network Called Networks Involve many various functions Associated

Discussion in Perceptrons

A single perceptron works well to classify a linearly

separable set of inputs

Multi-layer perceptrons

found as a “solution” to represent nonlineaerly

separable functions (1950s)

Many local minima – non-convex

Research in neural networks stopped until the 70s

5

Ref: Dr. Manuela Veloso, CMU

Page 6: NEURAL NETWORKS I - mkang.faculty.unlv.edumkang.faculty.unlv.edu/teaching/CS489_689/11.Neural Networks I.pdf · Neural Network Called Networks Involve many various functions Associated

XOR?

𝐗 =

00

01

11

01

, 𝑔 𝑥 = max(𝑥, 0)

𝐖1 = 1, 1; 1,1 ,𝐖2 = [1,−2], c = [0, −1]

6

Page 7: NEURAL NETWORKS I - mkang.faculty.unlv.edumkang.faculty.unlv.edu/teaching/CS489_689/11.Neural Networks I.pdf · Neural Network Called Networks Involve many various functions Associated

Neural Network

Feedforward Networks or Multilayer Perceptrons

(MLPs)

E.g., for a classifier

𝑦 = 𝑓∗(𝒙; 𝜽) maps an input 𝒙 to a category 𝑦

A feedforward network defines a mapping 𝑦 = 𝑓(𝒙; 𝜽)and learns the optimal parameters 𝜽.

7

Page 8: NEURAL NETWORKS I - mkang.faculty.unlv.edumkang.faculty.unlv.edu/teaching/CS489_689/11.Neural Networks I.pdf · Neural Network Called Networks Involve many various functions Associated

Neural Network

Called Networks

Involve many various functions

Associated with a directed acyclic graph that represent

how the functions are composed together

Mostly, chain structures

◼ 𝑓 𝑥 = 𝑓 3 𝑓 2 𝑓 1 𝑥

𝑓 𝑖 : the i-th layer in the network

8

Page 9: NEURAL NETWORKS I - mkang.faculty.unlv.edumkang.faculty.unlv.edu/teaching/CS489_689/11.Neural Networks I.pdf · Neural Network Called Networks Involve many various functions Associated

Neural Network

9

First layer

Second layer

Output layerThird layer

Input layer

Hidden layerDepth

Width

Page 10: NEURAL NETWORKS I - mkang.faculty.unlv.edumkang.faculty.unlv.edu/teaching/CS489_689/11.Neural Networks I.pdf · Neural Network Called Networks Involve many various functions Associated

Design of a NN model

Architecture of the network

How many layers

How these layers are connected to each other

How many units are in each layer

Activation function: to compute the hidden layer

values

Cost function: to optimize the model

Optimizer: how to optimize the model

10

Page 11: NEURAL NETWORKS I - mkang.faculty.unlv.edumkang.faculty.unlv.edu/teaching/CS489_689/11.Neural Networks I.pdf · Neural Network Called Networks Involve many various functions Associated

Weights and Bias

11

Labeling the Weights of a Neural Network

𝒃𝟏𝟑

𝒃𝒋𝒍 is the bias of the 𝒋𝒕𝒉 neuron of the 𝒍𝒕𝒉 layer

Page 12: NEURAL NETWORKS I - mkang.faculty.unlv.edumkang.faculty.unlv.edu/teaching/CS489_689/11.Neural Networks I.pdf · Neural Network Called Networks Involve many various functions Associated

Weighted Input and Activation of a Neuron

12

Weighted Input of the 𝑗𝑡ℎ neuron of the 𝑙𝑡ℎ layer is 𝑧𝑗𝑙:

𝑧𝑗𝑙 = σ𝑘 𝑤𝑗𝑘

𝑙 𝑎𝑘𝑙−1 + 𝑏𝑗

𝑙

Activation from the 𝑗𝑡ℎ neuron of the 𝑙𝑡ℎ layer is 𝑎𝑗𝑙:

𝑎𝑗𝑙 = 𝑓 𝑧𝑗

𝑙 , where 𝑓 is the activation function

𝑎𝑗𝑙 = 𝑓 σ𝑘 𝑤𝑗𝑘

𝑙 𝑎𝑘𝑙−1 + 𝑏𝑗

𝑙

In matrix notation, the activation becomes:

𝑎𝑙 = 𝑓 𝑤𝑙 𝑎𝑙−1 + 𝑏𝑙

Page 13: NEURAL NETWORKS I - mkang.faculty.unlv.edumkang.faculty.unlv.edu/teaching/CS489_689/11.Neural Networks I.pdf · Neural Network Called Networks Involve many various functions Associated

Example

13

Ref: https://visualstudiomagazine.com/articles/2013/05/01/neural-network-feed-forward.aspx

Page 14: NEURAL NETWORKS I - mkang.faculty.unlv.edumkang.faculty.unlv.edu/teaching/CS489_689/11.Neural Networks I.pdf · Neural Network Called Networks Involve many various functions Associated

Activation Functions

14

• Sigmoid Activation: 𝑎𝑗𝑙 = 𝜎 𝑧𝑗

𝑙 =1

1+ 𝑒−𝑧𝑗

𝑙

• Softmax Activation: 𝑎𝑗𝑙 =

𝑒𝑧𝑗𝑙

σ𝑘 𝑒𝑧𝑘𝑙

• Tanh Activation: 𝑡𝑎𝑛ℎ 𝑧𝑗𝑙 =

𝑒𝑥 − 𝑒−𝑥

𝑒𝑥 + 𝑒−𝑥

• Rectified Linear Activation: 𝑚𝑎𝑥 0, 𝑧𝑗𝑙 , maximum of

0 𝑜𝑟 𝑧𝑗𝑙

Page 15: NEURAL NETWORKS I - mkang.faculty.unlv.edumkang.faculty.unlv.edu/teaching/CS489_689/11.Neural Networks I.pdf · Neural Network Called Networks Involve many various functions Associated

Activation Functions

15

Page 16: NEURAL NETWORKS I - mkang.faculty.unlv.edumkang.faculty.unlv.edu/teaching/CS489_689/11.Neural Networks I.pdf · Neural Network Called Networks Involve many various functions Associated

Activation Functions

16

Page 17: NEURAL NETWORKS I - mkang.faculty.unlv.edumkang.faculty.unlv.edu/teaching/CS489_689/11.Neural Networks I.pdf · Neural Network Called Networks Involve many various functions Associated

Universal Approximator Theorem

One hidden layer may be enough to represent (not

learn) an approximation of any function to an

arbitrary degree of accuracy

So why deeper?

Shallow net may need (exponentially) more width

Shallow net may overfit more

17

Page 18: NEURAL NETWORKS I - mkang.faculty.unlv.edumkang.faculty.unlv.edu/teaching/CS489_689/11.Neural Networks I.pdf · Neural Network Called Networks Involve many various functions Associated

Cost Functions

18

• Quadratic Cost: 𝐶 =1

2𝑛σ𝑥 ||𝑦 𝑥 − 𝑎𝐿 𝑥 ||2

• Binary Cross-Entropy Cost: 𝐶 = −1

𝑛σ𝑥 ቀ

𝑦 𝑥 × log𝑒 𝑎𝐿 𝑥 + (

)

1 −

𝑦 × log𝑒 1 − 𝑎𝐿 𝑥

• Negative Log-likelihood Cost: 𝐶 = − log𝑒 𝑎𝑦𝐿

• A Cost function must satisfy the following two conditions (for backpropagation):

• The Cost function 𝐶 should be calculated as an average over the cost functions 𝐶𝑥 for individual training examples.

• The cost functions for the individual training examples 𝐶𝑥 and consequently the Cost 𝐶 function must be a function of the outputs of the neural network.

Page 19: NEURAL NETWORKS I - mkang.faculty.unlv.edumkang.faculty.unlv.edu/teaching/CS489_689/11.Neural Networks I.pdf · Neural Network Called Networks Involve many various functions Associated

Examples of Neg Log-likelihood Cost

Given the posterior probability and the ground truth:

A set of output probabilities: e.g. [0.1, 0.3, 0.5, 0.1]

Ground truth: e.g., [0, 0, 0, 1]

Likelihood

0*0.1 + 0*0.3 + 0*0.5 + 1*0.1 = 0.1

NLL: -ln(0.1) = 2.3

If ground truth is [0, 0, 1, 0]

0*0.1 + 0*0.3 + 1*0.5 + 0*0.1 = 0.5

NLL: -ln(0.5) = 0.69

19

Page 20: NEURAL NETWORKS I - mkang.faculty.unlv.edumkang.faculty.unlv.edu/teaching/CS489_689/11.Neural Networks I.pdf · Neural Network Called Networks Involve many various functions Associated

Output Types

20

Page 21: NEURAL NETWORKS I - mkang.faculty.unlv.edumkang.faculty.unlv.edu/teaching/CS489_689/11.Neural Networks I.pdf · Neural Network Called Networks Involve many various functions Associated

Discussion

Construct a neural network for MNIST data

28*28 pixel images and 10 labels

Construct a neural network for a binary

classification problem

10 features and 2 labels

Construct a neural network for predicting

tomorrow’s temperature in degree

10 variables

21