neural networks

Neural Networks

What are they

• Models of the human brain used for computational purposes

• Brain is made up of many interconnected neurons

What is a neuron

Components of biological neuron

• Dendrites – serve as inputs• Soma Cell of neuron which contains

Nucleus

• Nucleus- processing component of neuron• Axon along which output goes• Synapse - ends across whose gap connection is

made between other neurons

How does it work

• Signals move from neuron to neuron via electrochemical reactions. The synapses release a chemical transmitter which enters the dendrite. This raises or lowers the electrical potential of the cell body.

• The soma sums the inputs it receives and once a threshold level is reached an electrical impulse is sent down the axon (often known as firing).

• These impulses eventually reach synapses and the cycle continues.

Synapses

• Synapses which raise the potential within a cell body are called excitatory. Synapses which lower the potential are called inhibitory.

• It has been found that synapses exhibit plasticity. This means that long-term changes in the strengths of the connections can be formed depending on the firing patterns of other neurons. This is thought to be the basis for learning in our brains.

Artificial model of neuron

Diagram

•

• aj : Activation value of unit j• wj,i : Weight on the link from unit j to

unit i• ini : Weighted sum of inputs to unit i• ai : Activation value of unit i (also

known as the output value)• g : Activation function

How does this work

• A neuron is connected to other neurons via its input and output links. Each incoming neuron has an activation value and each connection has a weight associated with it.

• The neuron sums the incoming weighted values and this value is input to an activation function. The output of the activation function is the output from the neuron.

Common ActivationFunctions

Some common activation functions in more detail

• These functions can be defined as follows.• Stept(x) = 1 if x >= t, else 0• Sign(x) = +1 if x >= 0, else –1• Sigmoid(x) = 1/(1+e-x)• On occasions an identify function is also used

(i.e. where the input to the neuron becomes the output). This function is normally used in the input layer where the inputs to the neural network are passed into the network unchanged.

A brief history of Neural Networks

• In 1943 two scientists, Warren McCulloch and Walter Pitts, proposed the first artificial model of a biological neuron [McC]. This synthetic neuron is still the basis for most of today’s neural networks.

• Rosenblatt came up with his two layered perceptron which was subsequently shown to be defective by Papert and Minsky which lead to a huge decline in funding and interest in Neural Networks.

The bleak years• During this period, even though there was a lack of funding and

interest in neural networks, a small number of researchers continued to investigate the potential of neural models.

• A number of papers were published, but none had any great impact. Many of these reports concentrated on the potential of neural networks for aiding in the explanation of biological behaviour (e.g. [Mal], [Bro], [Mar], [Bie], [Coo]).

• Others focused on real world implementations. In 1972 Teuvo Kohonen and James A. Anderson independently proposed the same model for associative memory [Koh], [An1]

• In 1976 Marr and Poggio applied a neural network to a realistic problem in computational vision, stereopsis [Mar]. Other projects included [Lit], [Gr1], [Gr2], [Ama], [An2], [McC].

The Discovery of Backpropagation

• The backpropagation learning algorithm was developed independently by Rumelhart [Ru1], [Ru2], Le Cun [Cun] and Parker [Par] in 1986.

• It was subsequently discovered that the algorithm had also been described by Paul Werbos in his Harvard Ph.D thesis in 1974 [Wer].

• Error backpropagation networks are the most widely used neural network model as they can be applied to almost any problem that requires pattern mapping.

• It was the discovery of this paradigm that brought neural networks out of the research area and into real world implementation.

Interests in neural network differ according to profession.

• Neurobiologists and psychologists -understanding our brain

• Engineers and physicists -a tool to recognise patterns in noisy data (see Ts at right)

• Business analysts and engineers -a tool for modelling data

• Computer scientists and mathematicians - networks offer an alternative model of computing: machines that may be taught rather than programmed

• Artificial Intelligensia, cognitive scientists and philosophers -Subsymbolic processing (reasoning with patterns, not symbols)

Backpropagation Network Architecture

• A backpropagation network typically consists of three or more layers of nodes.

• The first layer is the known as the input layer and the last layer is known as the output layer.

• Any layers of nodes in between the input and output layers are known as hidden layers.

• Each unit in a layer is connected to every unit in the next layer. There are no interlayer connections.

Backpropagation

Output Layer

Hidden Layer

Input Layer

I

N

P

U

T

E

R

R

O

R

Operation of the network

• The operation of the network consists of a forward pass of the input through the network (forward propagation) and then a backward pass of an error value which is used in the weight modification (Backward Propagation)

Forward Propagation

• A forward propagation step is initiated when an input pattern is presented to the network.

• No processing is performed at the input layer. The pattern is propagated forward to the next layer, and each node in this layer performs a weighted sum on all its inputs

• After this sum has been calculated, a function is used to compute the unit’s output.

Example XOR

Layers of the Network

• The Input Layer• The input layer of a backpropagation

network acts solely as a buffer to hold the patterns being presented to the network. Each node in the input layer corresponds to one entry in the pattern. No processing is done at the input layer. The pattern is fed forward from the input layer to the next layer.

The Hidden Layers

• is the hidden layers which give the backpropagation network its exceptional computation abilities.

• The units in the hidden layers act as “feature detectors”. They extract information from the input patterns which can be used to distinguish between particular classes. The network creates its own internal representation of the data.

The Output Layer

• The output layer of a network uses the response of the feature detectors in the hidden layer. Each unit in the output layer emphasises each feature according to the values of the connecting weights. The pattern of activation at this layer is taken as the network’s response.

The sigmoid function

• The function used to perform this operation is the sigmoid function,

• The main reason why this particular function is chosen is that its derivative, which is used in the learning law, is easily computed.

• The result obtained after applying this function to the net input is taken to be the node’s output value.

• This process is continued until the pattern has been propagated through the entire network and reaches the output layer.

• The activation pattern at the output layer is taken as the network’s result.

Linear Separability and the XOR Problem

• Consider two-input patterns being classified into two classes

• Each point with either symbol of or represents a pattern with a set of values . Each pattern is classified into one of two classes.

• Notice that these classes can be separated with a single line . They are known as linearly separable patterns.

• Linear separability refers to the fact that classes of patterns with -dimensional vector can be separated with a single decision surface. In the case above, the line represents the decision surface.

Diagram

Xor

• The most classic example of linearly inseparable pattern is a logical exclusive-OR (XOR) function. Shown in the next figure is the illustration of XOR function that two classes, 0 for black dot and 1 for white dot, cannot be separated with a single line.

XOR linearly inseparable

The significance of This

• XOR is separable in 3 dimensions but obviously not in 2.

• So many classifiers will need more than 2 layers to classify

• Minsky and Papert pointed out that perceptrons in 2 layers couldn’t learn in 3 dimensions or more as far as they could see

• Because so many problems are like Xor then according to these stars of AI neural networks had limited applicability

But they were wrong

• Backpropagation showed that neural networks could learn in 3 and more dimensions

• However such was the stature of this pair that this impacted negatively on research in Neural networks for 2 decades

• However the work Werbos and Parker and Rumelhart proved them wrong and in 1987 working multilayer networks were working away and learning and have become a he industry

Backward Propagation

• The first step in the backpropagation stage is the calculation of the error between the network’s result and the desired response. This occurs when the forward propagation phase is completed.

• Each processing unit in the output layer is compared to its corresponding entry in the desired pattern and an error is calculated for each node in the output layer.

• The weights are then modified for all of the connections going into the output layer.

• Next, the error is backpropagated to the hidden layers and by using the generalised delta rule, the weights are adjusted for all connections going into the hidden layer.

• The procedure is continued until the last layer of weights have been modified. The forward and backward propagation phases are repeated until the network’s output is equal to the desired result.

The Backpropagation Learning Law

• The Learning Law used is known as the Generalised Delta Rule.

• It allows for the adjustment of the weights in the hidden layer, a feat deemed impossible by Minsky and Papert.

• It uses the derivative of the activation function of nodes (which in most cases is the sigmoid function) to determine the extent of the adjustment to the weights connecting to the hidden layers.

• In other words , the network learns from its errors and uses the difference between expected and actual results(the error) to make adjustments.

Example

• Calculate the weight adjustments in the following network for expected outputs of {1,1} and the learning rate is 1:

• The Target Values are 1, 1 and the learning rate is 1

Sample Neural Network

Hidden Layer Computation

• Xi =iW1 = • 1 * 1 + 0 * -1 = 1, • 1 * -1 + 0 * 1 = -1 = • { 1 - 1} = {Xi1,Xi2} = Xi

xF

1

1

• h = F(X)• h1 = F(Xi1) = F(1)• h2 = F(Xi2) = F(-1)

27.01

1

1

1)2(

73.01

1

1

1)1(

)1(2

)1(1

xi

xi

XiF

XiF

Output Layer Computation

• X = hW2 = • 0.73 * -1 + 0.27 * 0 = -0.73, • 0.73 * 0 + 0.27 * -1 = -0.27 =• { -0.73 - 0.27} = {X1,X2} = X

xF

1

1

• O = F(X)• O1 = F(X1)• O2 = F(X2)

58.01

1

1

1)2(

68.01

1

1

1)1(

)27.0(2

)73.0(1

x

x

XF

XF

Error

• d1 = 0.7( 1 -0.7)(0.7 - 1) = 0.7 (0.3)(-0.3) = -0.063

• d2 = 0.6(1 - 0.6)(0.6 - 1) = 0.6(0.4)(-0.4) = -0.096

Error Calculatione = h(1 - h)W2d

21

11

h

h

2

1

d

d

2221

1211

WW

WW 21 hh

2

1

e

e

Another Way to write the error

• e1 = (h1(1-h1)+ h2(1-h2)) W11 D1 +W12D2• e2 =( h1(1-h1)+ h2(1-h2)) W21 D1 +W22D2

• e1 = (0.73(1-0.73)+ 0.27(1-0.27))( -1* -0.063 +0*-0.096)• e2 =( 0.73(1-0.73)+ 0.27(1-0.27)) (0 *-0.063 +-1*-0.096)

• e1 = (0.73(0.27)+ 0.27(0.73))( 0.063)• e2 =( 0.73(0.27)+ 0.27(0.73)) (0.096)• e1 = 0.3942 * 0.063 = 0.247• e2 = 0.3942 * 0.096 = 0.038

outputsk

kkhiih dWhhe )1(

Weight Adjustment

• △W2t = α hd + Θ △W2t-1

• where α = 1

2212

211121

2

1

dhdh

dhdhdd

h

hhd

)096.0*27.0()063.0*27.0(

)096.0*73.0()063.0*73.0(096.0063.0

27.0

73.0hd

Weight Change

)026.0()017.0(

)107.0()046.0(

neural networks

Documents

incoming neuron

potential of neural

worka neuron

synthetic neuron

biological neuron mcc

inputssoma cell of neuron

potential of neural

activation value of