cs621 : artificial intelligence
DESCRIPTION
CS621 : Artificial Intelligence. Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 24 Sigmoid neuron and backpropagation. Feedforward n/w. A multilayer feedforward neural network has Input layer Output layer Hidden layer (assists computation) Output units and hidden units are called - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: CS621 : Artificial Intelligence](https://reader035.vdocuments.site/reader035/viewer/2022062722/56813988550346895da11c32/html5/thumbnails/1.jpg)
CS621 : Artificial Intelligence
Pushpak BhattacharyyaCSE Dept., IIT Bombay
Lecture 24
Sigmoid neuron and backpropagation
![Page 2: CS621 : Artificial Intelligence](https://reader035.vdocuments.site/reader035/viewer/2022062722/56813988550346895da11c32/html5/thumbnails/2.jpg)
Feedforward n/w
• A multilayer feedforward neural network has – Input layer– Output layer– Hidden layer (assists computation)
Output units and hidden units are called
computation units.
![Page 3: CS621 : Artificial Intelligence](https://reader035.vdocuments.site/reader035/viewer/2022062722/56813988550346895da11c32/html5/thumbnails/3.jpg)
Architecture of the n/w
• Fully connected feed forward network• Pure FF network (no jumping of connections
over layers)
Hidden layers
Input layer (n i/p neurons)
Output layer (m o/p neurons)
j
i
wji
….
….
….
….
![Page 4: CS621 : Artificial Intelligence](https://reader035.vdocuments.site/reader035/viewer/2022062722/56813988550346895da11c32/html5/thumbnails/4.jpg)
Training of the MLP
• Multilayer Perceptron (MLP)
• Question:- How to find weights for the hidden layers when no target output is available?
• Credit assignment problem – to be solved by “Gradient Descent”
![Page 5: CS621 : Artificial Intelligence](https://reader035.vdocuments.site/reader035/viewer/2022062722/56813988550346895da11c32/html5/thumbnails/5.jpg)
Gradient Descent Technique
• Let E be the error at the output layer
• ti = target output; oi = observed output
• i is the index going over n neurons in the outermost layer
• j is the index going over the p patterns (1 to p)• Ex: XOR:– p=4 and n=1
p
j
n
ijii otE
1 1
2)(2
1
![Page 6: CS621 : Artificial Intelligence](https://reader035.vdocuments.site/reader035/viewer/2022062722/56813988550346895da11c32/html5/thumbnails/6.jpg)
Weights in a ff NN
• wmn is the weight of the connection from the nth neuron to the mth neuron
• E vs surface is a complex surface in the space defined by the weights wij
• gives the direction in which a movement of the operating point in the wmn co-ordinate space will result in maximum decrease in error
W
m
n
wmn
mnw
E
mnmn w
Ew
![Page 7: CS621 : Artificial Intelligence](https://reader035.vdocuments.site/reader035/viewer/2022062722/56813988550346895da11c32/html5/thumbnails/7.jpg)
Sigmoid neurons• Gradient Descent needs a derivative computation
- not possible in perceptron due to the discontinuous step function used!
Sigmoid neurons with easy-to-compute derivatives used!
• Computing power comes from non-linearity of sigmoid function.
xy
xy
as 0
as 1
![Page 8: CS621 : Artificial Intelligence](https://reader035.vdocuments.site/reader035/viewer/2022062722/56813988550346895da11c32/html5/thumbnails/8.jpg)
Derivative of Sigmoid function
)1(1
11
1
1
)1()(
)1(
1
1
1
22
yyee
e
ee
edx
dy
ey
xx
x
xx
x
x
![Page 9: CS621 : Artificial Intelligence](https://reader035.vdocuments.site/reader035/viewer/2022062722/56813988550346895da11c32/html5/thumbnails/9.jpg)
Training algorithm
• Initialize weights to random values.
• For input x = <xn,xn-1,…,x0>, modify weights as follows
Target output = t, Observed output = o
• Iterate until E < (threshold)
ii w
Ew
2)(2
1otE
![Page 10: CS621 : Artificial Intelligence](https://reader035.vdocuments.site/reader035/viewer/2022062722/56813988550346895da11c32/html5/thumbnails/10.jpg)
Calculation of ∆wi
ii
ii
i
i
n
iii
ii
xoootw
w
Ew
xooot
W
net
net
o
o
E
xwnetwhereW
net
net
E
W
E
)1()(
)10 constant, learning(
)1()(
:1
0
![Page 11: CS621 : Artificial Intelligence](https://reader035.vdocuments.site/reader035/viewer/2022062722/56813988550346895da11c32/html5/thumbnails/11.jpg)
Observations
Does the training technique support our intuition?
• The larger the xi, larger is ∆wi
– Error burden is borne by the weight values corresponding to large input values
![Page 12: CS621 : Artificial Intelligence](https://reader035.vdocuments.site/reader035/viewer/2022062722/56813988550346895da11c32/html5/thumbnails/12.jpg)
Observations contd.
• ∆wi is proportional to the departure from target
• Saturation behaviour when o is 0 or 1
• If o < t, ∆wi > 0 and if o > t, ∆wi < 0 which is consistent with the Hebb’s law
![Page 13: CS621 : Artificial Intelligence](https://reader035.vdocuments.site/reader035/viewer/2022062722/56813988550346895da11c32/html5/thumbnails/13.jpg)
Hebb’s law
• If nj and ni are both in excitatory state (+1)– Then the change in weight must be such that it enhances
the excitation– The change is proportional to both the levels of excitation
∆wji is prop. to e(nj) e(ni)
• If ni and nj are in a mutual state of inhibition ( one is +1 and the other is -1),– Then the change in weight is such that the inhibition is
enhanced (change in weight is negative)
nj
ni
wji
![Page 14: CS621 : Artificial Intelligence](https://reader035.vdocuments.site/reader035/viewer/2022062722/56813988550346895da11c32/html5/thumbnails/14.jpg)
Saturation behavior
• The algorithm is iterative and incremental
• If the weight values or number of input values is very large, the output will be large, then the output will be in saturation region.
• The weight values hardly change in the saturation region
![Page 15: CS621 : Artificial Intelligence](https://reader035.vdocuments.site/reader035/viewer/2022062722/56813988550346895da11c32/html5/thumbnails/15.jpg)
Backpropagation algorithm
• Fully connected feed forward network• Pure FF network (no jumping of connections
over layers)
Hidden layers
Input layer (n i/p neurons)
Output layer (m o/p neurons)
j
i
wji
….
….
….
….
![Page 16: CS621 : Artificial Intelligence](https://reader035.vdocuments.site/reader035/viewer/2022062722/56813988550346895da11c32/html5/thumbnails/16.jpg)
Gradient Descent Equations
ji
jji
j
thj
ji
j
jji
jiji
w
netjw
jnet
E
netw
net
net
E
w
E
w
Ew
)layer j at theinput (
)10 rate, learning(
![Page 17: CS621 : Artificial Intelligence](https://reader035.vdocuments.site/reader035/viewer/2022062722/56813988550346895da11c32/html5/thumbnails/17.jpg)
Example - Character Recognition
• Output layer – 26 neurons (all capital)
• First output neuron has the responsibility of detecting all forms of ‘A’
• Centralized representation of outputs
• In distributed representations, all output neurons participate in output
![Page 18: CS621 : Artificial Intelligence](https://reader035.vdocuments.site/reader035/viewer/2022062722/56813988550346895da11c32/html5/thumbnails/18.jpg)
Backpropagation – for outermost layer
ijjjjji
jjjj
m
ppp
thj
j
j
jj
ooootw
oootj
otE
netnet
o
o
E
net
Ej
)1()(
))1()(( Hence,
)(2
1
)layer j at theinput (
1
2
![Page 19: CS621 : Artificial Intelligence](https://reader035.vdocuments.site/reader035/viewer/2022062722/56813988550346895da11c32/html5/thumbnails/19.jpg)
Backpropagation for hidden layers
Hidden layers
Input layer (n i/p neurons)
Output layer (m o/p neurons)
j
i
….
….
….
….
k
k is propagated backwards to find value of j
![Page 20: CS621 : Artificial Intelligence](https://reader035.vdocuments.site/reader035/viewer/2022062722/56813988550346895da11c32/html5/thumbnails/20.jpg)
Backpropagation – for hidden layers
ijjk
kkj
jjk
kjkj
jjk jk
jjj
j
j
jj
iji
ooow
oow
ooo
netk
net
E
ooo
E
net
o
o
E
net
Ej
jow
)1()(
)1()( Hence,
)1()(
)1(
layernext
layernext
layernext
![Page 21: CS621 : Artificial Intelligence](https://reader035.vdocuments.site/reader035/viewer/2022062722/56813988550346895da11c32/html5/thumbnails/21.jpg)
General Backpropagation Rule
ijjk
kkj ooow )1()(layernext
)1()( jjjjj ooot
iji jow • General weight updating rule:
• Where for outermost layer
for hidden layers
![Page 22: CS621 : Artificial Intelligence](https://reader035.vdocuments.site/reader035/viewer/2022062722/56813988550346895da11c32/html5/thumbnails/22.jpg)
Issues in the training algorithm
• Algorithm is greedy. It always changes weight such that E reduces.
• The algorithm may get stuck up in a local minimum.
• If we observe that E is not getting reduced anymore, the following may be the reasons:
![Page 23: CS621 : Artificial Intelligence](https://reader035.vdocuments.site/reader035/viewer/2022062722/56813988550346895da11c32/html5/thumbnails/23.jpg)
Issues in the training algorithm contd.
1. Stuck in local minimum.
2. Network paralysis. (High –ve or +ve i/p makes neurons to saturate.)
3. (learning rate) is too small.
![Page 24: CS621 : Artificial Intelligence](https://reader035.vdocuments.site/reader035/viewer/2022062722/56813988550346895da11c32/html5/thumbnails/24.jpg)
Diagnostics in action(1)
1) If stuck in local minimum, try the following:
• Re-initializing the weight vector.
• Increase the learning rate.
• Introduce more neurons in the hidden layer.
![Page 25: CS621 : Artificial Intelligence](https://reader035.vdocuments.site/reader035/viewer/2022062722/56813988550346895da11c32/html5/thumbnails/25.jpg)
Diagnostics in action (1) contd.
2) If it is network paralysis, then increase the number of neurons in the hidden layer.
Problem: How to configure the hidden layer ?
Known: One hidden layer seems to be sufficient. [Kolmogorov (1960’s)]
![Page 26: CS621 : Artificial Intelligence](https://reader035.vdocuments.site/reader035/viewer/2022062722/56813988550346895da11c32/html5/thumbnails/26.jpg)
Diagnostics in action(2)
Kolgomorov statement:
A feedforward network with three layers (input, output and hidden) with appropriate I/O relation that can vary from neuron to neuron is sufficient to compute any function.
More hidden layers reduce the size of individual layers.
![Page 27: CS621 : Artificial Intelligence](https://reader035.vdocuments.site/reader035/viewer/2022062722/56813988550346895da11c32/html5/thumbnails/27.jpg)
Diagnostics in action(3)
3) Observe the outputs: If they are close to 0 or 1, try the following:
1. Scale the inputs or divide by a normalizing factor.
2. Change the shape and size of the sigmoid.
![Page 28: CS621 : Artificial Intelligence](https://reader035.vdocuments.site/reader035/viewer/2022062722/56813988550346895da11c32/html5/thumbnails/28.jpg)
Diagnostics in action(3)1. Introduce momentum factor.
Accelerates the movement out of the trough. Dampens oscillation inside the trough. Choosing : If is large, we may jump over
the minimum.
iterationthnjiijiterationnthji wOw )1()()(
![Page 29: CS621 : Artificial Intelligence](https://reader035.vdocuments.site/reader035/viewer/2022062722/56813988550346895da11c32/html5/thumbnails/29.jpg)
An application in Medical Domain
![Page 30: CS621 : Artificial Intelligence](https://reader035.vdocuments.site/reader035/viewer/2022062722/56813988550346895da11c32/html5/thumbnails/30.jpg)
Expert System for Skin Diseases Diagnosis
• Bumpiness and scaliness of skin
• Mostly for symptom gathering and for developing diagnosis skills
• Not replacing doctor’s diagnosis
![Page 31: CS621 : Artificial Intelligence](https://reader035.vdocuments.site/reader035/viewer/2022062722/56813988550346895da11c32/html5/thumbnails/31.jpg)
Architecture of the FF NN
• 96-20-10• 96 input neurons, 20 hidden layer neurons, 10
output neurons• Inputs: skin disease symptoms and their
parameters– Location, distribution, shape, arrangement, pattern,
number of lesions, presence of an active norder, amount of scale, elevation of papuls, color, altered pigmentation, itching, pustules, lymphadenopathy, palmer thickening, results of microscopic examination, presence of herald pathc, result of dermatology test called KOH
![Page 32: CS621 : Artificial Intelligence](https://reader035.vdocuments.site/reader035/viewer/2022062722/56813988550346895da11c32/html5/thumbnails/32.jpg)
Output
• 10 neurons indicative of the diseases:– psoriasis, pityriasis rubra pilaris, lichen
planus, pityriasis rosea, tinea versicolor, dermatophytosis, cutaneous T-cell lymphoma, secondery syphilis, chronic contact dermatitis, soberrheic dermatitis
![Page 33: CS621 : Artificial Intelligence](https://reader035.vdocuments.site/reader035/viewer/2022062722/56813988550346895da11c32/html5/thumbnails/33.jpg)
Training data
• Input specs of 10 model diseases from 250 patients
• 0.5 is some specific symptom value is not knoiwn
• Trained using standard error backpropagation algorithm
![Page 34: CS621 : Artificial Intelligence](https://reader035.vdocuments.site/reader035/viewer/2022062722/56813988550346895da11c32/html5/thumbnails/34.jpg)
Testing
• Previously unused symptom and disease data of 99 patients
• Result:• Correct diagnosis achieved for 70% of papulosquamous
group skin diseases• Success rate above 80% for the remaining diseases
except for psoriasis• psoriasis diagnosed correctly only in 30% of the cases• Psoriasis resembles other diseases within the
papulosquamous group of diseases, and is somewhat difficult even for specialists to recognise.
![Page 35: CS621 : Artificial Intelligence](https://reader035.vdocuments.site/reader035/viewer/2022062722/56813988550346895da11c32/html5/thumbnails/35.jpg)
Explanation capability
• Rule based systems reveal the explicit path of reasoning through the textual statements
• Connectionist expert systems reach conclusions through complex, non linear and simultaneous interaction of many units
• Analysing the effect of a single input or a single group of inputs would be difficult and would yield incor6rect results
![Page 36: CS621 : Artificial Intelligence](https://reader035.vdocuments.site/reader035/viewer/2022062722/56813988550346895da11c32/html5/thumbnails/36.jpg)
Explanation contd.
• The hidden layer re-represents the data
• Outputs of hidden neurons are neither symtoms nor decisions
![Page 37: CS621 : Artificial Intelligence](https://reader035.vdocuments.site/reader035/viewer/2022062722/56813988550346895da11c32/html5/thumbnails/37.jpg)
Figure : Explanation of dermatophytosis diagnosis using the DESKNET expert system.
5 (Dermatophytosis node)
0( Psoriasis node )
Disease diagnosis
-2.71
-2.48
-3.46
-2.68
19
14
13
0
1.621.43
2.13
1.68
1.58
1.22
Symptoms & parametersDuration
of lesions : weeks 0
1
6
10
36
171
95
96
Duration of lesions : weeks
Minimal itching
Positive KOH test
Lesions locatedon feet
Minimalincrease
in pigmentation
Positive test forpseudohyphae
And spores
Bias
Internalrepresentation
1.46
20 Bias
-2.86
-3.31
9 (Seborrheic dermatitis node)
![Page 38: CS621 : Artificial Intelligence](https://reader035.vdocuments.site/reader035/viewer/2022062722/56813988550346895da11c32/html5/thumbnails/38.jpg)
Discussion
• Symptoms and parameters contributing to the diagnosis found from the n/w
• Standard deviation, mean and other tests of significance used to arrive at the importance of contributing parameters
• The n/w acts as apprentice to the expert