neural network
TRANSCRIPT
![Page 1: Neural network](https://reader030.vdocuments.site/reader030/viewer/2022020307/55b16bbabb61ebbf0b8b4676/html5/thumbnails/1.jpg)
The neural network connectionist architectures are not meant to duplicate the operation of the
human brain, but rather to receive inspiration from known facts about how the brain works. They
are characterized by having:
A large number of very simple neuron like processing elements.
A large number of weighted connections between the elements. The weights on the
connections encode the knowledge of a network.
Highly parallel, distributed control.
An emphasis on learning internal representations automatically.
Hopfield Networks:
Neural network researchers contend that there is a basic mismatch between standard computer
information processing technology and the technology used by the brain.
Hopfield introduced a neural network that he proposed as a theory of memory. A Hopfield
network has the following interesting features:
Distributed Representation – A memory is stored as a pattern of activation across a set of
processing elements. Furthermore, memories can be superimposed on one another;
different memories are represented by different patterns over the same set of processing
elements.
Distributed, Asynchronous Control – Each processing element makes decisions based
only on its own local situation. All these local actions add up to a global solution.
Content-Addressable Memory – A number of patterns can be stored in a network. To
receive a pattern, we need only specify a portion of it. The network automatically finds
the closest match.
Fault Tolerance – If a few processing elements misbehave or fail completely; the network
will still function properly.
A simple Hopfield net is shown in figure 1. Processing elements, or units, are always in one of
two states, active or inactive. In the figure, units colored black are active and nits colored white
![Page 2: Neural network](https://reader030.vdocuments.site/reader030/viewer/2022020307/55b16bbabb61ebbf0b8b4676/html5/thumbnails/2.jpg)
are inactive. Units are connected to each other with weighted, symmetric connections. A
positively weighted connection indicates that the units tend to activate each other. A negative
connection allows an active unit to deactivate a neighboring unit.
The network operates as follows. A random unit is chosen. If any of its neighbors are active, the
unit computes the sum of the weights on the connections to those active neighbors if the sum is
positive, the unit becomes active, otherwise it becomes inactive. Another random unit is chosen,
and the process repeats until the network reaches a stable states, i.e., until no more units can
change state. This process is called parallel relaxation. If the network starts in the state shown in
figure 1, the unit in the lower left corner will tend to activate the unit above it. This unit, in turn,
will attempt to activate the unit above it, but the inhibitory connection from the upper-right until
will foil this attempt, and so on.
This network has only four distinct stable states, which are shown in figure 2. Given any initial
state, the network will necessarily settle into one of these four configurations. The stable state in
which all units are inactive can only be reached if it is also the initial state. The network can be
thought of as storing the patterns in figure 2. Hopfield’s major contribution was to show that
given any set of weights and any initial state, his parallel relaxation algorithm would eventually
steer the network into a stable state. There can be no divergence or oscillation.
The network can be used as a content-addressable memory by setting the activities of the units to
correspond to a partial pattern. To receive a pattern, we need only supply a portion of it. The
network will then settle into the stable state that best matches the partial pattern. An example in
shown in figure 3.
Parallel relaxation is nothing more than search. It is useful to think of the various states of a
network as forming a search space, as in figure 4. A randomly chosen state will transform itself
ultimately into one of the local minima, namely the nearest stable state. This is how we get the
content-addressable behavior. In figure 4, state B is depicted as being lower than state A because
fewer constraints are violated. A constraint is violated, for example, when two active units are
connected by a negatively weighted connection.
![Page 3: Neural network](https://reader030.vdocuments.site/reader030/viewer/2022020307/55b16bbabb61ebbf0b8b4676/html5/thumbnails/3.jpg)
Content-Addressable Memory
a function or map
can learn some patterns (e.g. image of letter “J”) and given a partial pattern (slightly
scrambled image of “J”) will reproduce the nearest learnt pattern
Hopfield network has units that are either active or passive
weighted, symmetric connections between units
a pattern can be described as some particular combination of active units in a state (each
unit has a unique name or identifier)
learning a pattern = adjusting the weights
recalling a pattern = parallel relaxation algorithm
![Page 4: Neural network](https://reader030.vdocuments.site/reader030/viewer/2022020307/55b16bbabb61ebbf0b8b4676/html5/thumbnails/4.jpg)
parallel relaxation algorithm: boulder-valley analogy
o we have a boulder rolling down a valley
o the bottom of the valley represents the pattern that the Hopfield net has learnt
o wherever the boulder is initially placed, it will roll towards the nearest local
minimum – this represents the Hopfield net iteratively processing the next
network state
o the boulder will eventually stop rolling at the bottom of the valley – this
represents the stable state of the Hopfield network
o our landscape can have multiple valleys – this represents the Hopfield network
learning multiple patterns; note “crafting the landscape” represents “learning”
o depending on where the boulder is initially placed it will roll towards the nearest
valley – given some partial pattern, the Hopfield network (using the parallel
relaxation algorithm) will eventually stabilize at the closest matching pattern
![Page 5: Neural network](https://reader030.vdocuments.site/reader030/viewer/2022020307/55b16bbabb61ebbf0b8b4676/html5/thumbnails/5.jpg)
Perceptrons
perceptron “learning” means adjusting the weights w0…wn
activation function
o each unit (neuron) has n incoming connections (x1…xn) each with weights
(w1…wn)
o by default input x0 =1 meaning that w0 always applies – this acts as the threshold
as follows:
initial -2
step 1
+3
step 2
+3
step 3
-1
step 4
-3
step 5
+3
step 6 (continue until stable)
![Page 6: Neural network](https://reader030.vdocuments.site/reader030/viewer/2022020307/55b16bbabb61ebbf0b8b4676/html5/thumbnails/6.jpg)
o implicit threshold: the sum of all inputs must be greater then or equal to 0
e.g. for 2 inputs: w0 + w1x1 + w2x2 ≥ 0
o explicit threshold: the sum of all inputs must be greater than or equal to value t
(threshold)
note that we sum from i=1 to n (rather than from i=0)
So what about w0? If t= w0 then it’s the same as summing from i=0
w1x1 + w2x2 ≥ t (here’s the explicit threshold)
t + w1x1 + w2x2 ≥ 0
w0 + w1x1 + w2x2 ≥ 0 (we’re back to an implicit threshold)
perceptron can only represent linearly separable functions
o e.g. for two inputs (two dimensional system) it can separate the black and white
dots because it can draw a line through the graph that divides them
o line equation:
y = mx + b
o perceptron activation function as a line equation:
x2 = w1x1 + w0
w2 w2
![Page 7: Neural network](https://reader030.vdocuments.site/reader030/viewer/2022020307/55b16bbabb61ebbf0b8b4676/html5/thumbnails/7.jpg)
o note that this applies to any number of dimensions (any i≥1 used as xi )
e.g. 1-dimensional system might look like:
where the perceptron needs to find a good point (-w0 / w1) that splits the
white and black dots
e.g. 3-dimensional system might look like:
…where the perceptron needs to find a good plane
o perceptrons can’t learn functions that are not linearly separable
e.g. XOR:
Let x1 and x2 only take values of either 0 or 1
if x1=x2 then dot is white (represents Boolean false)
if x1≠x2 then dot is black (represents Boolean true)
![Page 8: Neural network](https://reader030.vdocuments.site/reader030/viewer/2022020307/55b16bbabb61ebbf0b8b4676/html5/thumbnails/8.jpg)
we can’t draw a straight line through this graph that separates the white
dots from the black dots
e.g. same problem in 1D as well:
we can’t choose a single point that will separate the black and white dots
o solution: multilayer networks
e.g. multilayer network with one hidden layer:
o Why does this solve the problem?
hidden layers increase the hypothesis space that our neural network can
represent
think of each hidden unit as a perceptron that draws a line through our 2D
graph (to classify dots as black or white)
so now our output unit can take the combination of these lines
e.g. consider XOR function again
Let x1 and x2 only take values of either 0 or 1
units a1 and a2 represent inputs x1 and x2 respectively
![Page 9: Neural network](https://reader030.vdocuments.site/reader030/viewer/2022020307/55b16bbabb61ebbf0b8b4676/html5/thumbnails/9.jpg)
perceptron a3
a1 + a2 ≥ 0.5
perceptron a4
a1 + a2 ≥ 1.5
the perceptron will give a “1” (on) if the input (dot) lies on the side of the
line that has a big red “1”
now we’ll combine a3 and a4 to capture the XOR function
we want: a3=1 a4=0 to give a “1” in our network
note that no dot can satisfy a3=0 a4=1 (compare the graphs
above)
the complete network is on the right
perceptron a5
a3 a4 ≥ 0.5
Complete multilayer feed-forward neural net
![Page 10: Neural network](https://reader030.vdocuments.site/reader030/viewer/2022020307/55b16bbabb61ebbf0b8b4676/html5/thumbnails/10.jpg)
note that there are lots of ways (infinite) to implement XOR in a neural
network
the error values at the “hidden” layer are sort of mysterious because the
training data only gives values for the input and the desired output
(doesn’t specify what values the hidden layers should take)
“learning” in multilayer feed-forward neural nets uses the back-
propagation algorithm
the error from the output layer is propagated back through to the
hidden layers (assess blame by dividing the error up among the
contributing weights)