supervised learning network

26
Supervised learning network G.Anuradha

Upload: giulio

Post on 05-Jan-2016

37 views

Category:

Documents


2 download

DESCRIPTION

Supervised learning network. G.Anuradha. Architecture. Earlier attempts to build intelligent and self learning systems using simple components Used to solve simple classification problems Used by Rosenblatt to explain the pattern-recognition abilities of biological visual systems. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Supervised learning network

Supervised learning network

G.Anuradha

Page 2: Supervised learning network

Architecture• Earlier attempts to build intelligent and self

learning systems using simple components• Used to solve simple classification problems • Used by Rosenblatt to explain the pattern-

recognition abilities of biological visual systems.

Sensory Unit Associator Unit Response Unit

Binary activation function Activation +1 0 -1

Page 3: Supervised learning network
Page 4: Supervised learning network

Quiz

• Which of the features would probably not be useful for classifying handwritten digits from binary images?

Raw pixels from imagesSet of strokes that can be combined to form

various digitsDay of the year on which the digits were

drawnNumber of pixels set to one

Page 5: Supervised learning network

Perceptron networks-Theorysingle-layer feed forward networks

1. It has 3 units:, 1. input(sensory),

2. hidden(associator unit)

3. Output (response unit)

2. Input-hidden fixed weights -1,0,1 assigned at random, binary activation fn:

3. Output unit (1,0,-1) activation, binary step fn: with threshold θ

4. Output of perceptron is )( inyfy

in

in

in

in

y

yif

ify

yf

1

0

1

)(

Page 6: Supervised learning network

Perceptron theory

5. Weight updation between hidden and output unit

6. Checks out for error between hidden and output layer

7. Error=target-calculated

8. weights are adjusted in case of error

toldbnewb

txoldwneww iii

)()(

)()(

α is the learning rate, ‘t’ is the target which is -1 or 1.

No error-no weight change-training is stopped

Page 7: Supervised learning network

Single classification perceptron network

x1

xn

X1

Xn

w1

wn

Y y

1

b

x0

xiXi

wi

Page 8: Supervised learning network

Perceptron training algo for single output classes

• Step 0: initialize weights,bias,learning rate(between 0 and1)

• Step 1: perform step 2-6 until final stopping condition is false

• Step 2: perform steps 3-5 for each training pair indicated by s:t

• Step 3: input layer is applied with identity activation fn:– xi=si

• Step 4: calculate yin

y=f(yin)

in

in

in

in

y

yif

ify

yf

1

0

1

)(

Page 9: Supervised learning network

Perceptron training algo for single output classes

• Step 5: Weight and bias adjustment: Compare the value of actual and desired(target)

If y≠t

toldbnewb

txoldwneww iii

)()(

)()(

else

wi(new)=wi(old)

b(new)=b(old)

•Step 6: train the network until there is no weight change. This is the stopping condition for the network. If not met start from Step n2

EXAMPLE

Page 10: Supervised learning network

Start

Initialize weights and bias

Set α (0 to 1)

For each s:t

Activate input unitsXi=si

Calculate net input

Apply activation function y=f(yin)

If y!=t

toldbnewb

txoldwneww iii

)()(

)()(

W(new)=w(old)B(new)=b(old)

If weight change

s

Stop

Y

Page 11: Supervised learning network

Perceptron training algo for multiple output classes

• Step 0: Initialize the weights, biases, and learning rate suitably

• Step 1: Check for stopping condition; if false then perform steps 2-6

• Step 2: Perform steps 3 to 5 for each bipolar or binary training vector pair s:t

• Step 3: Set activation(identity) a each input unit i=1 to n xi=si

Page 12: Supervised learning network

Perceptron training algo for multiple output classes

• Step 4: calculate output response

n

i

ijijinj wxby1

Activations are applied over the net input to calculate the output response

in

in

in

in

y

yif

ify

yf

1

0

1

)(

Page 13: Supervised learning network

Perceptron training algo for multiple output classes

• Step 5: Make adjustment in weights and bias for j=1 to m and i=1 to n

If ti≠yj then

ijijij xtoldwneww )()(

else

)()(

)()(

oldbnewb

oldwneww

jj

ijij

Step 6: Check for stopping condition. No change in weights then stop training process

Page 14: Supervised learning network

Example of AND

Page 15: Supervised learning network

Linear separability

• Perceptron network is used for linear separability concept.

• Separating line is based of threshold θ• The condition for separating the response

from region of positive to region of zero isw1x1+w2x2+b> θ

• The condition for separating the response from region of zero to region of negative isw1x1+w2x2+b<- θ

Page 16: Supervised learning network

What binary threshold neurons cannot do

• A binary threshold output unit cannot even tell if two single bit features are the same!

Positive cases (same): (1,1) 1; (0,0) 1

Negative cases (different): (1,0) 0; (0,1) 0• The four input-output pairs give four inequalities that are impossible to

satisfy:

211 xx

Page 17: Supervised learning network

A geometric view of what binary threshold neurons cannot do

• Imagine “data-space” in which the axes correspond to components of an input vector.– Each input vector is a point in this

space. – A weight vector defines a plane in

data-space.– The weight plane is perpendicular

to the weight vector and misses the origin by a distance equal to the threshold.

0,1

0,0 1,0

1,1

weight plane output =1output =0

The positive and negative casescannot be separated by a plane

Page 18: Supervised learning network

Discriminating simple patterns under translation with wrap-around

• Suppose we just use pixels as the features.

• Can a binary threshold unit discriminate between different patterns that have the same number of on pixels?– Not if the patterns can

translate with wrap-around!

pattern A

pattern A

pattern A

pattern B

pattern B

pattern B

Page 19: Supervised learning network

Learning with hidden units

• For such linear separability problem we require an additional layer called as hidden layer.

• Networks without hidden units are very limited in the input-output mappings they can learn to model..

• We need multiple layers of adaptive, non-linear hidden units.

Page 20: Supervised learning network

Solution to EXOR problem

Page 21: Supervised learning network

ADALINE

• A network with a single linear unit is called an ADALINE (ADAptive LINear Neuron)

• Input-output relationship is linear• Uses bipolar activation for its input signals

and its target output• Weights between the input and output are

adjustable and has only one output unit• Trained using Delta rule (Least mean

square) or (Widrow-Hoff rule)

Page 22: Supervised learning network

Architecture

• Delta rule for Single output unit– Minimize the error over all training patterns.– Done by reducing the error for each pattern one at a time

• Delta rule for adjusting the weight for ith pattern is (i=1to n)

• Delta rule in case of several output units for adjusting the weight from ith input unit to jth output unit

( )ij inj iw t y x

( )i in iw t y x

Page 23: Supervised learning network

Difference between Perceptron and Delta Rule

Perceptron Delta

Originates from hebbian assumption

Derived from gradient-descent method

Stops after a finite number of learning steps

Continuous forever converging asymptotically to the solution

Minimizes error over all training patterns

Page 24: Supervised learning network

Architecture

1

X1

X2

Xn

f(yin)

Adaptivealgorithm

O/p errorgenerator

x0=1

x1

x2

xn

b

w1

w2

wn

yin= x1wi

yin

te=t-yin

Page 25: Supervised learning network

Start

Initialize weights and bias and α

Input the specified tolerance error Es

For each s:t

Activate input unitsXi=si

Calculate net inputYin=b+Σxi wi

)()()(

)()()(

yintoldbnewb

xyintoldwneww iii

Calculate errorEi=Σ(t-yin)2

If Ei=Es

Stop

Y

Y

Page 26: Supervised learning network

Madaline

• Two or more adaline are integrated to develop madaline model

• Used for nonlinearly separable logic functions (EX-OR) function

• Used for adaptive noise cancellation and adaptive inverse control

• In noise cancellation the objective is to filter out an interference component by identifying a linear model of a measurable noise source and the corresponding immeasurable interference.

• ECG, echo elimination from long distance telephone transmission lines