Download - NN – cont
NN – cont.
Alexandra I. CristeaUSI intensive course “Adaptive Systems” April-May 2003
• We have seen how the neuron computes, let’s see– What it can compute?– How it can learn?
What does the neuron compute?
Perceptron, discrete neuron
• First, simple case: – no hidden layers– Only one neuron
– Get rid of threshold – b becomes w0
– Y – Boolean function : > 0 fires 0 doesn’t fire
Threshold function f
f
(w0 = - t = -1)
t=1
1f
Y = X1 or X2
W1=1 W2= 1
X1X2
0 0 1
1 1 1
0 1Y
f
X1 X2
t=1
1f
Y = X1 and X2
W1= 0,5 W2= 0,5
X1X2
0 0 0
1 0 1
0 1Y
f
X1 X2
t=1
1f
Y = or(x1,…,xn)
w1=w2=…=wn=1t=1
1f
Y = and(x1,…,xn)
w1=w2=…=wn=1/nt=1
1f
What are we actually doing?
X1X2
0 -1 1
1 1 1
0 1Y
X1X2
0 0 0
1 0 1
0 1Y
X1X2
0 0 1
1 1 1
0 1Y
w0+w1*X1+w2*X2
W 0=-1; W1 = 7; W2= 9
W 0=-1; W1 = 0,7; W2= 0,9
W 0=1; W1 = 7; W2= 9
X1
X2
x1
x2
w0+w1*x1+w2*x2
w0= - 1w1= - 0,67w2= 1
Linearly Separable Set
w0+w1*x1+w2*x2
Linearly Separable Set
x1
x2
w0= - 1w1= 0,25w2= - 0,1
w0+w1*x1+w2*x2
Linearly Separable Set
x1
x2
w0= - 1w1= 0,25w2= 0,04
w0+w1*x1+w2*x2
Linearly Separable Set
x1
x2
w0= - 1w1= 0,167w2= 0,1
Non-linearly separable Set
w0+w1*x1+w2*x2
Non Linearly Separable Set
x1
x2
w0=w1=w2=
w0+w1*x1+w2*x2Non Linearly Separable Set
x1
x2
w0=w1=w2=
w0+w1*x1+w2*x2Non Linearly Separable Set
x1
x2
w0=w1=w2=
w0+w1*x1+w2*x2Non Linearly Separable Set
x1
x2
w0=w1=w2=
Perceptron Classification Theorem
A finite set X can be classified correctly by a one-layer perceptron if and only if it is linearly separable.
w0+w1*x1+w2*x2
Typical non-linearly separable set: Y=XOR(x1,x2)
x1
x20,0 1,0
0,1 1,1
Y=1Y=0
How does the neuron learn?
Learning: weight computation• W1* ( X1 =1)+ W 2 * ( X2= 1)>=(t=
1)• W1* ( X1 =0)+ W 2 * ( X2= 1)<(t=1)• W1* ( X1 =1)+ W 2 * ( X2= 0)<(t=1)• W1* ( X1 =0)+ W 2 * ( X2= 0)<(t=1)
X2
X1
W1*X1 + W2*X2
Perceptron Learning Ruleincremental version
FOR i:= 0 TO n DO wi:=random initial value ENDFOR;
REPEAT select a pair (x,t) in X; (* each pair must have a positive probability of
being selected *) IF wT * x' > 0 THEN y:=1 ELSE y:=0 ENDIF; IF y t THEN
FOR i:= 0 TO n DO wi:= wi + (t-y) xi' ENDFOR ENDIF;
UNTIL X is correctly classified
ROSENBLATT (1962)
Idea Perceptron Learning Rule
w
x’
wnew wnew=w + x’ t=1y=0 (wTx’0)
wniew
x’
w
x’ x’
wnew=w - x’
wi:= wi + (t-y) xi'
w changes in the w changes in the direction of the input direction of the input
+ -
t=0y=1 (wTx’>0)
For multi-layered perceptrons w. continuous neurons, a simple and successful learning algorithm exists.
BKP:ErrorBKP:Error
Input Output
Hidden layery1、 d 1
y2、 d 2
y3、 d 3
y4、 d 4
e1=d1 - y1
e2=d2 - y2
e3=d3 - y3
e4=d4 - y4
Hidden Hidden layerlayererrorerror ??
Synapse
W : weight
neuron1 neuron2
y1value
y2 = w*y1value
Value (y1,y2)= Internal activation
Forward propagation
Weight serves as amplifier!Weight serves as amplifier!
Inverse Synapse
W : weight
neuron1 neuron2
e1=????value
e2value
Value(e1,e2)= Error
Backward propagation
Weight serves as amplifier!Weight serves as amplifier!
Inverse Synapse
W : weight
neuron1 neuron2
e1=ww ** e2e2value
e2value
Value(e1,e2)= Error
Backward propagation
Weight serves as amplifier!Weight serves as amplifier!
BKP:ErrorBKP:Error
Input Output
Hidden layery1、 d 1
y2、 d 2
y3、 d 3
y4、 d 4
e1=d1 - y1
e2=d2 - y2
e3=d3 - y3
e4=d4 - y4
Hidden Hidden layerlayererrorerror ??
O2 O1I1 O2,I2
Backpropagation to hidden layerBackpropagation to hidden layer
w1
w3
w2Input
I1Output
O1
Hidden layer
ee [ j ] = ie [ i ]w[ j,i ]Backpropagation :
e 1
e 2
e 3O2,I2
Update rule for 2 weight typesUpdate rule for 2 weight types
• ① I2 ( hidden layer ) , O1 ( system output )• ② I1 ( system input ) , O2 ( hidden layer )
① Δ w =α(d[i]-y[i]) f’(S[i])f(S[i]) = =αe[i] f(S[i]) (simplification (simplification f’f’=1 for repeater, e.g.)=1 for repeater, e.g.)
S[i] = jw[j, i ](t)h[j]
② Δ w =α ( ie[i] w [j,i] ) f’(S[j])f(S[j]) =α ee[j]f(S[j]) S[j] = kw[k,j](t)x[k]
Backpropagation algorithmFOR s := 1 TO r DO Ws := initial matrix(often random);
REPEAT
select a pair (x,t) in X; y0:=x; # forward phase: compute the actual output ys of the network with input x
FOR s := 1 TO r DO ys := F(Ws ys-1) END; # yr is the output vector of the network # backpropagation phase: propagate the errors back through the network # and adapt the weights of all layers
dr := Fr’ (t - yr) ;
FOR s := r TO 2 DO ds-1 := Fs-1' WsT ds;
Ws := Ws + ds ys-1T; END;
W1 := W1 + d1 y0T
UNTIL stop criterion
Conclusion
• We have seen binary function representation with single layer perceptron
• We have seen a learning algorithm for SLP
• We have seen a learning algorithm for MLP (BP)
• So, neurons can represent knowledge AND learn!