artificial neural networks part-2 · perceptron learning rule • learning signal is the difference...
TRANSCRIPT
Ass. Prof. Dr.Asaad Sabah Hadi
2018-2019
Artificial Neural Networks
Part-2
Neural Network Learning rules
c – learning constant
Hebbian Learning Rule
• The learning signal is equal to the
neuron’s output
FEED FORWARD UNSUPERVISED LEARNING
Features of Hebbian Learning
• Feedforward unsupervised learning
• “When an axon of a cell A is near enough
to exicite a cell B and repeatedly and
persistently takes place in firing it, some
growth process or change takes place in
one or both cells increasing the efficiency”
• If oixj is positive the results is increase in
weight else vice versa
Final answer:
• For the same inputs for bipolar continuous
activation function the final updated weight
is given by
Perceptron Learning rule• Learning signal is the difference between the
desired and actual neuron’s response
• Learning is supervised
Delta Learning Rule
• Only valid for continuous activation function
• Used in supervised training mode
• Learning signal for this rule is called delta
• The aim of the delta rule is to minimize the error over all training patterns
Delta Learning Rule Contd.
Learning rule is derived from the condition of least squared error.
Calculating the gradient vector with respect to wi
Minimization of error requires the weight changes to be in the negative
gradient direction
Widrow-Hoff learning Rule
• Also called as least mean square learning rule
• Introduced by Widrow(1962), used in supervised learning
• Independent of the activation function
• Special case of delta learning rule wherein activation function is an identity function ie f(net)=net
• Minimizes the squared error between the desired output value di
and neti
Winner-Take-All learning rules
Winner-Take-All Learning rule
Contd…
• Can be explained for a layer of neurons
• Example of competitive learning and used for
unsupervised network training
• Learning is based on the premise that one of the
neurons in the layer has a maximum response
due to the input x
• This neuron is declared the winner with a weight
Summary of learning rules
Linear Separability
• Separation of the input space into regions is based on whether the network response is positive or negative
• Line of separation is called linear-separable line.
• Example:-
– AND function & OR function are linear separable Example
– EXOR function Linearly inseparable. Example
Hebb Network
• Hebb learning rule is the simpliest one
• The learning in the brain is performed by the
change in the synaptic gap
• When an axon of cell A is near enough to excite
cell B and repeatedly keep firing it, some growth
process takes place in one or both cells
• According to Hebb rule, weight vector is found to
increase proportionately to the product of the
input and learning signal.
yxoldwneww iii )()(
Flow chart of Hebb training
algorithmStart
Initialize Weights
For
Each
s:t
Activate input
xi=si
1
1
Activate output
y=t
Weight updateyxoldwneww iii )()(
Bias update
b(new)=b(old) + y
Stop
y
n
• Hebb rule can be used for pattern
association, pattern categorization, pattern
classification and over a range of other
areas
• Problem to be solved:
Design a Hebb net to implement OR
function
How to solve
Use bipolar data in the place of binary data
Initially the weights and bias are set to zero
w1=w2=b=0
X1 X2 B y
1 1 1 1
1 -1 1 1
-1 1 1 1
-1 -1 1 -1
Inputs y Weight changes weights
X1 X2 b Y W1 W2 B W1(0) W2(0) (0)b
1 1 1 1 1 1 1 1 1 1
1 -1 1 1 1 -1 1 2 0 2
-1 1 1 1 -1 1 1 1 1 3
-1 -1 1 -1 1 1 -1 2 2 2
Learning by trial‐and‐error
Continuous process of:
Trial:Processing an input to produce an output (In terms of ANN: Compute
the output function of a given input)
Evaluate:
Evaluating this output by comparing the actual output with
the expected output.
Adjust:Adjust the weights.
Example: XOR
Hidden
Layer, with
three
neurons
Output
Layer, with
one neuron
Input Layer,
with two
neurons
How it works?
Hidden
Layer, with
three
neurons
Output
Layer, with
one neuron
Input Layer,
with two
neurons
How it works? Set initial values of the weights randomly.
Input: truth table of the XOR
Do
Read input (e.g. 0, and 0)
Compute an output (e.g. 0.60543)
Compare it to the expected output. (Diff= 0.60543)
Modify the weights accordingly.
Loop until a condition is met
Condition: certain number of iterations
Condition: error threshold
Who is concerned with NNs?
Design Issues
Initial weights (small random values ∈[‐1,1])
Transfer function (How the inputs and the weights are
combined to produce output?)
Error estimation
Weights adjusting
Number of neurons
Data representation
Size of training set
Transfer Functions Linear: The output is proportional to the total
weighted input.
Threshold: The output is set at one of two values,
depending on whether the total weighted input is
greater than or less than some threshold value.
Non‐linear: The output varies continuously but not
linearly as the input changes.
Error Estimation The root mean square error (RMSE) is a frequently-
used measure of the differences between values
predicted by a model or an estimator and the values
actually observed from the thing being modeled or
estimated
Weights Adjusting
After each iteration, weights should be adjusted to
minimize the error.
– All possible weights
– Back propagation
Learning Paradigms
Supervised learning
Unsupervised learning
Reinforcement learning
Supervised learning This is what we have seen so far!
A network is fed with a set of training samples (inputs and
corresponding output), and it uses these samples to learn
the general relationship between the inputs and the outputs.
This relationship is represented by the values of the weights
of the trained network.
Unsupervised learning
No desired output is associated with the training data!
Faster than supervised learning
Used to find out structures within data:
Clustering
Compression
Reinforcement learning
Like supervised learning, but:
Weights adjusting is not directly related to the error
value.
The error value is used to randomly, shuffle weights!
Relatively slow learning due to ‘randomness’.
Back Propagation
Back-propagation is an example of supervised learning is
used at each layer to minimize the error between the
layer’s response and the actual data
The error at each hidden layer is an average of the
evaluated error
Hidden layer networks are trained this way
Back Propagation
N is a neuron.
Nw is one of N’s inputs weights
Nout is N’s output.
Nw = Nw +Δ Nw
Δ Nw = Nout * (1‐ Nout)* NErrorFactor
NErrorFactor = NExpectedOutput – NActualOutput
This works only for the last layer, as we can know
the actual output, and the expected output.
Number of neurons Many neurons:
Higher accuracy
Slower
Risk of over‐fitting
Memorizing, rather than understanding
The network will be useless with new problems.
Few neurons:
Lower accuracy
Inability to learn at all
Optimal number.
Data representation Usually input/output data needs pre‐processing
Pictures
Pixel intensity
Text:
A pattern
Size of training set
No one‐fits‐all formula
Over fitting can occur if a “good” training set is not
chosen
What constitutes a “good” training set?
Samples must represent the general population.
Samples must contain members of each class.
Samples in each class must contain a wide range of
variations or noise effect.
The size of the training set is related to the number of
hidden neurons
Applications Areas Function approximation
including time series prediction and modeling.
Classification
including patterns and sequences recognition, novelty
detection and sequential decision making.
(radar systems, face identification, handwritten text recognition)
Data processing
including filtering, clustering blinds source separation and
compression. (data mining, e-mail Spam filtering)
Advantages / Disadvantages
Advantages
Adapt to unknown situations
Powerful, it can model complex functions.
Ease of use, learns by example, and very little user
domain‐specific expertise needed
Disadvantages
Forgets
Not exact
Large complexity of the network structure
Q. How does each neuron work in ANNS?
What is back propagation?
A neuron: receives input from many other neurons;
changes its internal state (activation) based on the
current input;
sends one output signal to many other neurons, possibly
including its input neurons (ANN is recurrent network).
Back-propagation is a type of supervised learning, used at each
layer to minimize the error between the layer’s response and the
actual data.