artificial neural networks. the brain how do brains work? how do human brains differ from that of...
TRANSCRIPT
Artificial Neural Networks
The Brain
How do brains work?How do human brains differ from that
of other animals?
Can we base models ofartificial intelligence onthe structure and innerworkings of the brain?
The Brain
The human brain consists of:Approximately 10 billion neurons …and 60 trillion connections
The brain is a highly complex, nonlinear,parallel information-processing systemBy firing neurons simultaneously, the brain
performs faster than the fastest computers in existence today
The human brain consists of:Approximately 10 billion neurons …and 60 trillion connections (synapses)
Soma Soma
Synapse
Synapse
Dendrites
Axon
Synapse
Dendrites
Axon
An individual neuron has a very simple structureCell body is called a somaSmall connective fibers are called dendritesSingle long fibers are called axons
An army of such elements constitutes tremendous processing power
Soma Soma
Synapse
Synapse
Dendrites
Axon
Synapse
Dendrites
Axon
Artificial Neural Networks
An artificial neural network consists of a numberof very simple processors called neurons
Neurons are connected by weighted links
The links pass signals from one neuron to another based on predefined thresholds
Artificial Neural Networks
An individual neuron (McCulloch & Pitts, 1943):Computes the weighted sum of the input
signals Compares the result with a threshold value,
If the net input is less than the threshold,
the neuron output is –1 (or 0)Otherwise, the neuron becomes activated
and its output is +1
Artificial Neural Networks
Neuron Y
InputSignals
x1
x2
xn
OutputSignals
Y
Y
Y
w2
w1
wn
Weights
X = x1w1 + x2w2 + ... + xnwn
threshold
Activation Functions
Individual neurons adhere to an activation function, which determines whether they propagate their signal (i.e. activate) or not:
Sign Function
Activation Functions
Activation Functions
The step, sign, and sigmoid activation functionsare also often called hard limit functions
We use such functions in decision-making neural networksSupport classification and other pattern
recognition tasks
Perceptrons
Can an individual neuron learn?In 1958, Frank Rosenblatt introduced a
training algorithm that provided the first procedure for training asingle-node neural network
Rosenblatt’s perceptron model consists of a single neuron with adjustable synaptic weights, followed by a hard limiter
Perceptrons
Threshold
Inputs
x1
x2
Output
Y
HardLimiter
w2
w1
LinearCombiner
X = x1w1 + x2w2
Y = Ystep
Perceptrons
A perceptron:Classifies inputs x1, x2, ..., xn
into one of two distinctclasses A1 and A2
Forms a linearly separablefunction defined by: x1
x2
Class A2
Class A1
1
2
x1w1 +x2w2 =0
(a) Two-inputperceptron. (b) Three-inputperceptron.
x2
x1
x3x1w1 +x2w2 +x3w3 =0
12
Perceptrons
Perceptron with threeinputs x1, x2, and x3 classifies its inputsinto two distinctsets A1 and A2
x1
x2
Class A2
Class A1
1
2
x1w1 +x2w2 =0
(a) Two-inputperceptron. (b) Three-inputperceptron.
x2
x1
x3x1w1 +x2w2 +x3w3 =0
12
Perceptrons
How does a perceptron learn?A perceptron has initial (often random)
weights typically in the range [-0.5, 0.5]Apply an established training dataset Calculate the error as
expected output minus actual output:
error e = Yexpected – Yactual
Adjust the weights to reduce the error
Perceptrons
How do we adjust a perceptron’s weights to produce Yexpected?If e is positive, we need to increase Yactual
(and vice versa)
Use this formula:, where
and
α is the learning rate (between 0 and 1) e is the calculated error
Perceptron Example – AND
Train a perceptron to recognize logical AND
Use threshold Θ = 0.2 andlearning rate α = 0.1
Perceptron Example – AND
Train a perceptron to recognize logical AND
Use threshold Θ = 0.2 andlearning rate α = 0.1
Perceptron Example – ANDRepeat until convergence
i.e. final weights do not change and no error
Use threshold Θ = 0.2 andlearning rate α = 0.1
Perceptron Example – AND
Two-dimensional plotof logical AND operation:
A single perceptron canbe trained to recognizeany linear separable function Can we train a perceptron to
recognize logical OR?How about logical exclusive-OR (i.e. XOR)?
x1
x2
1
1
x1
x2
1
1
(b) OR (x1 x2)
x1
x2
1
1
(c) Exclusive-OR(x1 x2)
00 0
Perceptron – OR and XOR
Two-dimensional plots of logical OR and XOR:
x1
x2
1
1
x1
x2
1
1
(b) OR (x1 x2)
x1
x2
1
1
(c) Exclusive-OR(x1 x2)
00 0
Perceptron Coding Exercise
Write a code to:Calculate the error at each stepModify weights, if necessary
i.e. if error is non-zeroLoop until all error values are zero
for a full epoch
Modify your code to learn to recognize the logical OR operationTry to recognize the XOR
operation....
InputLayer OutputLayer
MiddleLayer
Multilayer neural networks consist of:An input layer of source neuronsOne or more hidden layers of
computational neuronsAn output layer of more
computational neurons
Input signals are propagated in alayer-by-layer feedforward manner
Multilayer Neural Networks
Multilayer Neural Networks
InputLayer OutputLayer
MiddleLayer
I n
p u
t
S i
g
n a
l s
O u
t p
u t
S
i g
n
a l
s
Multilayer Neural Networks
Inputlayer
Firsthiddenlayer
Secondhiddenlayer
Outputlayer
I n
p u
t
S i
g n
I
n p
u t
S
i g
n
a l
sa l
s
O u
t p
u t
S
i g
p
u t
S
i g
n
a l
sn
a l
s
Multilayer Neural Networks
Inputlayer
xi
x1
x2
xn
1
2
i
n
Outputlayer
1
2
k
l
yk
y1
y2
yl
Inputsignals
Error signals
wjk
Hiddenlayer
wij
1
2
j
m
XINPUT = x1 XH = x1w11 + x2w21 + ... + xiwi1 + ... + xnwn1
XOUTPUT = yH1w11 + yH2w21 + ... + yHjwj1 + ... + yHmwm1
y55
x1 31
x2
Inputlayer
Outputlayer
Hiddenlayer
42
3w13
w24
w23
w24
w35
w45
4
5
1
1
1
Three-layer network:
Multilayer Neural Networks
w14
Commercial-quality neural networks often incorporate 4 or more layersEach layer consists of about 10-1000
individual neurons
Experimental and research-based neural networks often use 5 or 6 (or more) layersOverall, millions of individual neurons may
be used
Multilayer Neural Networks
A back-propagation neural network is a multilayer neural network that propagates error backwards through the network as it learnsWeights are modified based on the
calculated error
Training is complete when the error is below a specified threshold e.g. less than 0.001
Back-Propagation NNs
Back-Propagation NNs
Inputlayer
xi
x1
x2
xn
1
2
i
n
Outputlayer
1
2
k
l
yk
y1
y2
yl
Inputsignals
Error signals
wjk
Hiddenlayer
wij
1
2
j
m
y55
x1 31
x2
Inputlayer
Outputlayer
Hiddenlayer
42
3w13
w24
w23
w24
w35
w45
4
5
1
1
1
Back-Propagation NNs
Use the sigmoid activation function; andapply Θ by connecting fixed input -1 to weight Θ
w14
Initially: w13 = 0.5, w14 = 0.9, w23 = 0.4, w24 = 1.0, w35 = -1.2, w45 = 1.1, 3 = 0.8, 4 = -0.1 and 5 = 0.3.
33
Step 2Step 2: Activation : Activation Activate the back-propagation neural Activate the back-propagation neural network by applying inputs network by applying inputs xx11((pp), ), xx22((pp),…, ),…, xxnn((pp) )
and desired outputs and desired outputs yydd,1,1((pp), ), yydd,2,2((pp),…, ),…, yydd,,nn((pp).).
((aa) Calculate the actual outputs of the neurons in ) Calculate the actual outputs of the neurons in the hidden layer:the hidden layer:
where where n n is the number of inputs of neuron is the number of inputs of neuron j j in the in the hidden layer, and hidden layer, and sigmoid sigmoid is the is the sigmoid sigmoid activation activation function.function.
j
n
iijij pwpxsigmoidpy
1
)()()(
34
((bb) Calculate the actual outputs of the neurons in ) Calculate the actual outputs of the neurons in the output layer:the output layer:
Step 2Step 2 : Activation (continued): Activation (continued)
where where m m is the number of inputs of neuron is the number of inputs of neuron k k in the in the output layer.output layer.
k
m
jjkjkk pwpxsigmoidpy
1
)()()(
35
We consider a training set where inputs We consider a training set where inputs xx1 1 and and xx22 are are
equal to 1 and desired output equal to 1 and desired output yydd,5,5 is 0. The actual is 0. The actual
outputs of neurons 3 and 4 in the hidden layer are outputs of neurons 3 and 4 in the hidden layer are calculated ascalculated as
Now the actual output of neuron 5 in the output layer Now the actual output of neuron 5 in the output layer is determined as:is determined as:
Thus, the following error is obtained:Thus, the following error is obtained:
5250.01/1)( )8.014.015.01(32321313 ewxwxsigmoidy
8808.01/1)( )1.010.119.01(42421414 ewxwxsigmoidy
5097.01/1)( )3.011.18808.02.15250.0(54543535 ewywysigmoidy
5097.05097.0055, yye d
04/20/23 Intelligent Systems and Soft Computing 36
Step 3Step 3: Weight training : Weight training Update the weights in the back-propagation Update the weights in the back-propagation network propagating backward the errors associated network propagating backward the errors associated with output neurons. with output neurons. ( (aa) Calculate the error gradient for the neurons in ) Calculate the error gradient for the neurons in the output layer:the output layer:
wherewhere
Calculate the weight corrections:Calculate the weight corrections:
Update the weights at the output neurons:Update the weights at the output neurons:
)()()1( pwpwpw jkjkjk
)()(1)()( pepypyp kkkk
)()()( , pypype kkdk
)()()( ppypw kjjk
04/20/23 Intelligent Systems and Soft Computing 37
((bb) Calculate the error gradient for the neurons in ) Calculate the error gradient for the neurons in the hidden layer:the hidden layer:
Step 3Step 3: Weight training (continued): Weight training (continued)
Calculate the weight corrections:Calculate the weight corrections:
Update the weights at the hidden neurons:Update the weights at the hidden neurons:
)()()(1)()(1
][ pwppypyp jk
l
kkjjj
)()()( ppxpw jiij
)()()1( pwpwpw ijijij
04/20/23 Intelligent Systems and Soft Computing 38
The next step is weight training. To update the The next step is weight training. To update the weights and threshold levels in our network, we weights and threshold levels in our network, we propagate the error, propagate the error, ee, from the output layer , from the output layer backward to the input layer.backward to the input layer.
First, we calculate the error gradient for neuron 5 in First, we calculate the error gradient for neuron 5 in the output layer:the output layer:
Then we determine the weight corrections assuming Then we determine the weight corrections assuming that the learning rate parameter, that the learning rate parameter, , is equal to 0.1:, is equal to 0.1:
1274.05097).0(0.5097)(10.5097)1( 555 eyy
0112.0)1274.0(8808.01.05445 yw0067.0)1274.0(5250.01.05335 yw
0127.0)1274.0()1(1.0)1( 55
04/20/23 Intelligent Systems and Soft Computing 39
Next we calculate the error gradients for neurons 3 Next we calculate the error gradients for neurons 3 and 4 in the hidden layer:and 4 in the hidden layer:
We then determine the weight corrections:We then determine the weight corrections:
0381.0)2.1(0.1274)(0.5250)(10.5250)1( 355333 wyy
0.0147.114)0.127(0.8808)(10.8808)1( 455444 wyy
0038.00381.011.03113 xw0038.00381.011.03223 xw
0038.00381.0)1(1.0)1( 33 0015.0)0147.0(11.04114 xw0015.0)0147.0(11.04224 xw
0015.0)0147.0()1(1.0)1( 44
04/20/23 Intelligent Systems and Soft Computing 40
At last, we update all weights and threshold:At last, we update all weights and threshold:
The training process is repeated until the sum ofThe training process is repeated until the sum of squared errors is less than 0.001.squared errors is less than 0.001.
5038.00038.05.0131313 www
8985.00015.09.0141414 www
4038.00038.04.0232323 www
9985.00015.00.1242424 www
2067.10067.02.1353535 www
0888.10112.01.1454545 www
7962.00038.08.0333
0985.00015.01.0444
3127.00127.03.0555