artificial neural network (draft)

61
Artificial Neural Networks An Introduction

Upload: james-boulie

Post on 10-May-2015

242 views

Category:

Science


2 download

DESCRIPTION

An introduction to artificial neural networks.

TRANSCRIPT

Page 1: Artificial Neural Network (draft)

Artificial Neural NetworksAn Introduction

Page 2: Artificial Neural Network (draft)

Why study ANNs?

• To understand how the brain actually works

• To understand a type of parallel computation• IBM’s “The Brain Chip” (1 million neurons and 256 synapses)

• To solve practical problems• Artificial Neural Networks should be good for things brains are good at

and bad at things brains are bad at

(Vision, speech recognition)

(eg: 32 * 71 = ???)

Page 3: Artificial Neural Network (draft)
Page 4: Artificial Neural Network (draft)

What neurons look like

Our model will be simplified: Synapses -> Weighted Inputs Soma -> An activation function Axon -> Outputs

Neuron – an electrically excitable cell that transmits information Dendrites receive signals from many

other neurons These signals can be either excitatory

or inhibitory Soma (cell body) processes this

information Above a certain threshold, an

electrical signal is fired down the axon.

Page 5: Artificial Neural Network (draft)

A Feed Forward Neural Net Weighted connections

Page 6: Artificial Neural Network (draft)

What can NNs do?

• Image recognition MNIST handwritten digits Read reCAPTCHA better than humans do

• Speech recognition and NLP

• Answer the meaning of life

Page 7: Artificial Neural Network (draft)

Using recurrent neural net to predict the next character• In 2011, Ilya Sutskever used 5 millions strings of 100 characters, taken

from Wikipedia.

• Training took one month on a GPU

• Once trained, the neural net will predict the next character in a sequence of characters

• He fed it the phrase “The meaning of life is” _______________

Page 8: Artificial Neural Network (draft)

Ilya Sutskever, 2011

Page 9: Artificial Neural Network (draft)

A Brief History of ANNs

• 1943 – McCulluogh and Pitts NN models can represent any Boolean

• 1949 – Donald Webb describes how learning might take place: “cells that fire together, wire together.”

• 1959 – Rosenblatt’s perceptron can learn linearly separable data

• 1969 – Minsky & Papert criticize the perceptron

• 1970-1986 – The dark ages of neural networks (No funding)

• 1986 – Hinton, LeCun et al. describe the backpropagation algorithm for training neural networks of arbitrary depth ( Paul Werbos, 1974)

• 1997 – A.K. Dewdney – "Although neural nets do solve a few toy problems, their powers of computation are so limited that I am surprised anyone takes them seriously as a general problem-solving tool.“

• Other techniques (Random Forests[1995] and Support Vector Machines[1995]) are considered state of the art ML for classification problems

• 2006 – Second Renaissance of neural networks with new methods for training deep and recurrent NNs.

Page 10: Artificial Neural Network (draft)

1930 1940 1950 1960 1970 1980 1990 2000 2010 2020

Ln(#

of

Pu

blic

atio

ns) ANN Scholarly Publications Per Year

(Ln Normalized)

1986

1969

Page 11: Artificial Neural Network (draft)
Page 12: Artificial Neural Network (draft)
Page 13: Artificial Neural Network (draft)

“the embryo of an electronic computer that [the

Navy] expects will be able to walk, talk, see, write,

reproduce itself and be conscious of its existence."

1958Frank Rosenblatt’s Perceptron

Page 14: Artificial Neural Network (draft)

The Perceptron

Page 15: Artificial Neural Network (draft)

Weight space

Consider all the different sets of weights that will output the correct value for a 2-D input vector.Here, threshold = 0

Input vector with output value = 1

Good weight

Bad weight

Page 16: Artificial Neural Network (draft)

NAND example

Input Data

0 0

1 0

0 1

1 1

1

1

1

0

One of many possible solutions:

Page 17: Artificial Neural Network (draft)

NAND Decision Boundary

One possible solution:

Page 18: Artificial Neural Network (draft)

NAND Decision Boundary

0 0

1 0

0 1

1 1

1

1

1

0

One possible solution:

Page 19: Artificial Neural Network (draft)

Training Data

0 0

1 0

0 1

1 1

0

1

1

0

A single perceptron can only solve linearly separable problems

XOR Problem

Page 20: Artificial Neural Network (draft)

Training Data

0 0

1 0

0 1

1 1

0

1

1

0

Multiple layers of perceptrons solve the XOR problem, but Rosenblatt did not have an learning algorithm to set the weights

XOR Problem

Page 21: Artificial Neural Network (draft)

Training Data

0 0

1 0

0 1

1 1

0

1

1

0

Multiple layers of perceptrons solve the XOR problem, but Rosenblatt did not have an learning algorithm to set the weights

XOR Problem

Page 22: Artificial Neural Network (draft)

Training Data

0 0

1 0

0 1

1 1

0

1

1

0

XOR Problem

1 0

1 1

1 1

0 1

2 weight planes

Page 23: Artificial Neural Network (draft)

Training Data

0 0

1 0

0 1

1 1

0

1

1

0

XOR Problem

1 0

1 1

1 1

0 1

1 weight plane

Page 24: Artificial Neural Network (draft)

Sigmoid (Logistic Function)

• The sigmoid function is similar to the binary threshold function, but it is continuous“Squashes” – outputs a value between 0 and 1

• It’s derivative has a nice property – it is computationally inexpensive

Page 25: Artificial Neural Network (draft)

Sigmoid (Logistic function)

Page 26: Artificial Neural Network (draft)

Sigmoid Neurons

Page 27: Artificial Neural Network (draft)

Sigmoid Neurons • We can “bake in” the bias by augmenting with an element, , that we set to a constant value (say, 1) for every sample.

• now represents the bias value.• With the bias “baked in,” the model has a simpler

notation and will be more computationally efficient

Page 28: Artificial Neural Network (draft)

Forward propagation

Page 29: Artificial Neural Network (draft)
Page 30: Artificial Neural Network (draft)
Page 31: Artificial Neural Network (draft)
Page 32: Artificial Neural Network (draft)
Page 33: Artificial Neural Network (draft)

}Matrix notation is easier to read, and used in production code

Page 34: Artificial Neural Network (draft)

How to train a feed forward net?

• There are several cost functions (cross entropy, classification error, squared error).

• To measure the error in this sample we will use the squared error

• Major DifficultyWe know what the output target is, but nobody is telling us directly what the

hidden units should be

Page 35: Artificial Neural Network (draft)

How to train a feed forward net?

• Try: randomly perturb one weight and see if it improves performance

• But this is very, very slow

Page 36: Artificial Neural Network (draft)

Backpropagation, 1986*

• The “backward propagation of errors” after forward propagation

• Here is the cost for a single training sample

• If we calculate the error derivatives w.r.t. each weight, we can update the weights with gradient descent.

Page 37: Artificial Neural Network (draft)

Backpropagating errorsStep 0. Feed Forward Network

Page 38: Artificial Neural Network (draft)

Backpropagating errorsStep 1. Backpropagate the error derivative to each node

Page 39: Artificial Neural Network (draft)

Backpropagating errorsStep 1. Backpropagate the error derivative to each nodeStep 2. Use the node deltas to compute the incoming weight derivatives

Page 40: Artificial Neural Network (draft)

Backpropagation error derivatives

Feed ForwardBack Propagate

Linear Output Neuron

Page 41: Artificial Neural Network (draft)
Page 42: Artificial Neural Network (draft)
Page 43: Artificial Neural Network (draft)
Page 44: Artificial Neural Network (draft)

Back Propagation can be used to train a neural net with which of the following activation functions?

Logistic (sigmoid)

Linear

Binary threshold neurons (Perceptron)

Hyperbolic Tangent ( )

Page 45: Artificial Neural Network (draft)

Review Gradient Descent

Page 46: Artificial Neural Network (draft)
Page 47: Artificial Neural Network (draft)

Selecting Hyper-Parameters

Generally, we use trial and error (with cross-validation) to select hyperparameters

What learning rate?

Momentum?

How many layers?

How many nodes / layer?

Regularization coefficient?

Activation function(s)?

Page 48: Artificial Neural Network (draft)

Regularization

• Without some form of regularization, large ANNs are prone to over fitting

• ANNs can approximate any function; they can fit the noise in the training data set

• One traditional solution is L2 regularization. We modify our error function by including for every weight in the matrix.

• L2 regularization drives the weights towards 0

• As the weights approach zero, the sigmoid function becomes more linear

• Recently, new forms of regularization have improved ANNs learning

Page 49: Artificial Neural Network (draft)

New regularizers

• Force neurons to share weights

Page 50: Artificial Neural Network (draft)

Learning Curves – Overfitting - regularization

Page 51: Artificial Neural Network (draft)

NN libraries

• Theano (python)

• PyLearn2 (python)

• Torch (Lua)

• Deep Learning Toolbox (MATLAB)

• Numenta (python)

• Nnet (R)

Page 52: Artificial Neural Network (draft)

Do walk through with ConvnetJS

Page 53: Artificial Neural Network (draft)

Google Trends “Neural Network” searches

Page 54: Artificial Neural Network (draft)

Google Trends “Random Forests” searches

Page 55: Artificial Neural Network (draft)

Google Trends “Deep Learning” searches

Page 56: Artificial Neural Network (draft)

Addressing ANN’s weaknesses: Averaging many models• Unlike random forests (which average many decision trees),

creating many neural network models has not been feasible• Averaging models is important because it prevents over fitting• NN Dropout (2012) provides a way to average many models,

without having to train them separately

Page 57: Artificial Neural Network (draft)

Provide motivation for Deep Learning

Page 58: Artificial Neural Network (draft)
Page 59: Artificial Neural Network (draft)
Page 60: Artificial Neural Network (draft)
Page 61: Artificial Neural Network (draft)