the cognitive inversion lecture 13 - utk · the cognitive inversion ... 1
Post on 29-Apr-2019
222 Views
Preview:
TRANSCRIPT
Artificial Neural Networks 11/2/06 17:29
1
11/2/06 1
Lecture 13Lecture 13
Artificial Neural NetworksArtificial Neural Networks
11/2/06 2
The Cognitive Inversion• Computers can do some things very well that are difficult
for people– e.g., arithmetic calculations– playing chess & other board games– doing proofs in formal logic & mathematics– handling large amounts of data precisely
• But computers are very bad at some things that are easy forpeople (and even some animals)– e.g., face recognition & general object recognition– autonomous locomotion– sensory-motor coordination
• Conclusion: brains work very differently from VonNeumann computers
11/2/06 3
The 100-Step Rule
• Typical recognitiontasks take less thanone second
• Neurons take severalmilliseconds to fire
• Therefore then can beat most about 100sequential processingsteps
11/2/06 4
Two Different Approaches toComputing
Von Neumann: Narrow but Deep
…
Neural Computation: Shallow but Wide
11/2/06 5
How Wide?
Retina:100 million receptors
Optic nerve: one million nerve fibers
11/2/06 6
Neurons are Not Logic Gates• Speed
– electronic logic gates are very fast (nanoseconds)– neurons are comparatively slow (milliseconds)
• Precision– logic gates are highly reliable digital devices– neurons are imprecise analog devices
• Connections– logic gates have few inputs (usually 1 to 3)– many neurons have >100 000 inputs
Artificial Neural Networks 11/2/06 17:29
2
11/2/06 7
Artificial Neural Networks
11/2/06 8
Typical Artificial Neuron
inputs
connectionweights
threshold
output
11/2/06 9
Typical Artificial Neuron
summation
net input
activationfunction
11/2/06 10
Operation of Artificial Neuron
Σ
2
2
–1
4
0
1
1
0 × 2 = 01 × 1 = –11 × 4 = 4
3 3 > 21
11/2/06 11
Operation of Artificial Neuron
Σ
2
2
–1
4
1
1
0
1 × 2 = 21 × 1 = –10 × 4 = 0
1 1 < 20
11/2/06 12
Feedforward Network
. . .
. . . . . . . . .
. . .
. . .
inputlayer
outputlayer
hiddenlayers
Artificial Neural Networks 11/2/06 17:29
3
11/2/06 13
Feedforward Network InOperation (1)
. . .
. . . . . . . . .
. . .
. . .input
11/2/06 14
Feedforward Network InOperation (2)
. . .
. . . . . . . . .
. . .
. . .
11/2/06 15
Feedforward Network InOperation (3)
. . .
. . . . . . . . .
. . .
. . .
11/2/06 16
Feedforward Network InOperation (4). .
.
. . . . . . . . .
. . .
. . .
11/2/06 17
Feedforward Network InOperation (5)
. . .
. . . . . . . . .
. . .
. . .
11/2/06 18
Feedforward Network InOperation (6)
. . .
. . . . . . . . .
. . .
. . .
Artificial Neural Networks 11/2/06 17:29
4
11/2/06 19
Feedforward Network InOperation (7)
. . .
. . . . . . . . .
. . .
. . .
11/2/06 20
Feedforward Network InOperation (8)
. . .
. . . . . . . . .
. . .
. . . output
11/2/06 21
Comparison with Non-NeuralNet Approaches
• Non-NN approaches typically decide outputfrom a small number of dominant factors
• NNs typically look at a large number offactors, each of which weakly influencesoutput
• NNs permit:– subtle discriminations– holistic judgments– context sensitivity
11/2/06 22
Connectionist Architectures• The knowledge is implicit in the connection
weights between the neurons• Items of knowledge are not stored in dedicated
memory locations, as in a Von Neumanncomputer
• “Holographic” knowledge representation:– each knowledge item is distributed over many
connections– each connection encodes many knowledge items
• Memory & processing is robust in face of damage,errors, inaccuracy, noise, …
11/2/06 23
Differences fromDigital Calculation
• Information represented in continuousimages (rather than language-likestructures)
• Information processing by continuousimage processing (rather than explicit rulesapplied in individual steps)
• Indefiniteness is inevitable (rather thandefiniteness assumed)
11/2/06 24
Supervised Learning• Produce desired outputs for training inputs• Generalize reasonably & appropriately to
other inputs• Good example: pattern recognition• Neural nets are trained rather than
programmed– another difference from Von Neumann
computation
Artificial Neural Networks 11/2/06 17:29
5
11/2/06 25
Learning for Output Neuron (1)
Σ
2
2
–1
4
0
1
1
3 3 > 21
Desiredoutput:
1
No change!
11/2/06 26
Learning for Output Neuron (2)
Σ
2
2
–1
4
0
1
1
3 3 > 21
Desiredoutput:
0
Output istoo large
X 3.25
–1.5 X X1.75
XXX1.75< 2
X 0
11/2/06 27
Learning for Output Neuron (3)
Σ
2
2
–1
4
1
1
0
1 1 < 20
Desiredoutput:
0
No change!
11/2/06 28
Learning for Output Neuron (4)
Σ
2
2
–1
4
1
1
0
1 1 < 20
Desiredoutput:
1
Output istoo small
–0.5 X
X 2.75
X2.25
XXX2.25> 2
X 1
11/2/06 29
Credit Assignment Problem
. . .
. . . . . . . . .
. . .
. . .
inputlayer
outputlayer
hiddenlayers
How do we adjust the weights of the hidden layers?
. . .
Desiredoutput
11/2/06 30
Back-Propagation:Forward Pass
. . .
. . . . . . . . .
. . .
. . . . . .
DesiredoutputInput
Artificial Neural Networks 11/2/06 17:29
6
11/2/06 31
Back-Propagation:Actual vs. Desired Output
. . .
. . . . . . . . .
. . .
. . . . . .
DesiredoutputInput
11/2/06 32
Back-Propagation:Correct Output Neuron Weights
. . .
. . . . . . . . .
. . .
. . . . . .
DesiredoutputInput
11/2/06 33
Back-Propagation:Correct Last Hidden Layer
. . .
. . . . . . . . .
. . .
. . . . . .
DesiredoutputInput
11/2/06 34
Back-Propagation:Correct All Hidden Layers
. . .
. . . . . . . . .
. . .
. . . . . .
DesiredoutputInput
11/2/06 35
Back-Propagation:Correct All Hidden Layers
. . .
. . . . . . . . .
. . .
. . . . . .
DesiredoutputInput
11/2/06 36
Back-Propagation:Correct All Hidden Layers
. . .
. . . . . . . . .
. . .
. . . . . .
DesiredoutputInput
Artificial Neural Networks 11/2/06 17:29
7
11/2/06 37
Back-Propagation:Correct First Hidden Layer
. . .
. . . . . . . . .
. . .
. . . . . .
DesiredoutputInput
11/2/06 38
Use of Back-Propagation (BP)• Typically the weights are changed slowly• Typically net will not give correct outputs
for all training inputs after one adjustment• Each input/output pair is used repeatedly for
training• BP may be slow• But there are many better ANN learning
algorithms
11/2/06 39
ANN Training Procedures• Supervised training: we show the net the
output it should produce for each traininginput (e.g., BP)
• Reinforcement training: we tell the net if itsoutput is right or wrong, but not what thecorrect output is
• Unsupervised training: the net attempts tofind patterns in its environment withoutexternal guidance
11/2/06 40
Applications of ANNs
• “Neural nets are the second-best way ofdoing everything”
• If you really understand a problem, you candesign a special purpose algorithm for it,which will beat a NN
• However, if you don’t understand yourproblem very well, you can generally train aNN to do it well enough
11/2/06 41
The Hopfield Network
(and constraint satisfaction)
11/2/06 42
Hopfield Network• Symmetric weights: wij = wji
• No self-action: wii = 0• Zero threshold: θ = 0• Bipolar states: si ∈ {–1, +1}• Discontinuous bipolar activation function:
!
" h( ) = sgn h( ) =#1, h < 0
+1, h > 0
$ % &
Artificial Neural Networks 11/2/06 17:29
8
11/2/06 43
Positive Coupling
• Positive sense (sign)• Large strength
11/2/06 44
Negative Coupling
• Negative sense (sign)• Large strength
11/2/06 45
Weak Coupling• Either sense (sign)• Little strength
11/2/06 46
State = –1 & Local Field < 0
h < 0
11/2/06 47
State = –1 & Local Field > 0
h > 0
11/2/06 48
State Reverses
h > 0
Artificial Neural Networks 11/2/06 17:29
9
11/2/06 49
State = +1 & Local Field > 0
h > 0
11/2/06 50
State = +1 & Local Field < 0
h < 0
11/2/06 51
State Reverses
h < 0
11/2/06 52
Hopfield Net as Soft ConstraintSatisfaction System
• States of neurons as yes/no decisions• Weights represent soft constraints between
decisions– hard constraints must be respected– soft constraints have degrees of importance
• Decisions change to better respectconstraints
• Is there an optimal set of decisions that bestrespects all constraints?
11/2/06 53
Demonstration of Hopfield Net
Run Hopfield Demo
11/2/06 54
Convergence
• Does such a system converge to a stablestate?
• Under what conditions does it converge?• There is a sense in which each step relaxes
the “tension” in the system (or increases its“harmony”)
• But could a relaxation of one neuron lead togreater tension in other places?
Artificial Neural Networks 11/2/06 17:29
10
11/2/06 55
Quantifying Harmony–10
9
2
–1
–1
2
9
–10
H = 10
H = 9
H = 2
H = 1
H = –1
H = –2
H = –9
H = –10
incr
easi
ng h
arm
ony decreasing harm
ony
11/2/06 56
Energy
• “Energy” (or “tension”) is the opposite ofharmony
• E = –H
Harmony Energy
11/2/06 57
Energy Does Not Increase
• In each step in which a neuron is consideredfor update:E{s(t + 1)} – E{s(t)} ≤ 0
• Energy cannot increase• Energy decreases if any neuron changes• Must it stop? (Yes)
11/2/06 58
Conclusion
• If we do asynchronous updating, theHopfield net must reach a stable, minimumenergy state in a finite number of updates
• This does not imply that it is a globalminimum
11/2/06 59
ConceptualPicture of
Descent onEnergySurface
(fig. from Solé & Goodwin) 11/2/06 60
EnergySurface
(fig. from Haykin Neur. Netw.)
Artificial Neural Networks 11/2/06 17:29
11
11/2/06 61
EnergySurface
+FlowLines
(fig. from Haykin Neur. Netw.) 11/2/06 62
FlowLines
(fig. from Haykin Neur. Netw.)
Basins ofAttraction
11/2/06 63
StoringMemories as
Attractors
(fig. from Solé & Goodwin) 11/2/06 64
Demonstration of Hopfield Net
Run Hopfield Demo
11/2/06 65
Example ofPattern
Restoration
(fig. from Arbib 1995) 11/2/06 66
Example ofPattern
Restoration
(fig. from Arbib 1995)
Artificial Neural Networks 11/2/06 17:29
12
11/2/06 67
Example ofPattern
Restoration
(fig. from Arbib 1995) 11/2/06 68
Example ofPattern
Restoration
(fig. from Arbib 1995)
11/2/06 69
Example ofPattern
Restoration
(fig. from Arbib 1995) 11/2/06 70
Example ofPattern
Completion
(fig. from Arbib 1995)
11/2/06 71
Example ofPattern
Completion
(fig. from Arbib 1995) 11/2/06 72
Example ofPattern
Completion
(fig. from Arbib 1995)
Artificial Neural Networks 11/2/06 17:29
13
11/2/06 73
Example ofPattern
Completion
(fig. from Arbib 1995) 11/2/06 74
Example ofPattern
Completion
(fig. from Arbib 1995)
11/2/06 75
Example ofAssociation
(fig. from Arbib 1995) 11/2/06 76
Example ofAssociation
(fig. from Arbib 1995)
11/2/06 77
Example ofAssociation
(fig. from Arbib 1995) 11/2/06 78
Example ofAssociation
(fig. from Arbib 1995)
Artificial Neural Networks 11/2/06 17:29
14
11/2/06 79
Example ofAssociation
(fig. from Arbib 1995) 11/2/06 80
Applications ofHopfield Memory
• Pattern restoration• Pattern completion• Pattern generalization• Pattern association
11/2/06 81
Hopfield Net for Optimizationand for Associative Memory
• For optimization:– we know the weights (couplings)– we want to know the minima (solutions)
• For associative memory:– we know the minima (retrieval states)– we want to know the weights
11/2/06 82
Hebb’s Rule
“When an axon of cell A is near enough toexcite a cell B and repeatedly or persistentlytakes part in firing it, some growth ormetabolic change takes place in one or bothcells such that A’s efficiency, as one of thecells firing B, is increased.”
—Donald Hebb (The Organization of Behavior, 1949, p. 62)
11/2/06 83
Example of Hebbian Learning:Pattern Imprinted
11/2/06 84
Example of Hebbian Learning:Partial Pattern Reconstruction
Artificial Neural Networks 11/2/06 17:29
15
11/2/06 85
Stochastic Neural Networks
(in particular, the stochastic Hopfield network)
11/2/06 86
Trapping in Local Minimum
11/2/06 87
Escape from Local Minimum
11/2/06 88
Escape from Local Minimum
11/2/06 89
Motivation
• Idea: with low probability, go against the localfield– move up the energy surface– make the “wrong” microdecision
• Potential value for optimization: escape from localoptima
• Potential value for associative memory: escapefrom spurious states– because they have higher energy than imprinted states
11/2/06 90
The Stochastic Neuron
!
Deterministic neuron : " s i= sgn h
i( )Pr " s
i= +1{ } =# h
i( )Pr " s
i= $1{ } =1$# h
i( )
!
Stochastic neuron :
Pr " s i= +1{ } =# h
i( )Pr " s
i= $1{ } =1$# h
i( )
!
Logistic sigmoid : " h( ) =1
1+ exp #2h T( )
h
σ(h)
Artificial Neural Networks 11/2/06 17:29
16
11/2/06 91
Properties of Logistic Sigmoid
• As h → +∞, σ(h) → 1• As h → –∞, σ(h) → 0• σ(0) = 1/2
!
" h( ) =1
1+ e#2h T
11/2/06 92
Logistic SigmoidWith Varying T
11/2/06 93
Logistic SigmoidT = 0.5
Slope at origin = 1 / 2T
11/2/06 94
Logistic SigmoidT = 0.01
11/2/06 95
Logistic SigmoidT = 0.1
11/2/06 96
Logistic SigmoidT = 1
Artificial Neural Networks 11/2/06 17:30
17
11/2/06 97
Logistic SigmoidT = 10
11/2/06 98
Logistic SigmoidT = 100
11/2/06 99
Pseudo-Temperature
• Temperature = measure of thermal energy (heat)• Thermal energy = vibrational energy of molecules• A source of random motion• Pseudo-temperature = a measure of nondirected
(random) change• Logistic sigmoid gives same equilibrium
probabilities as Boltzmann-Gibbs distribution
11/2/06 100
Simulated Annealing
(Kirkpatrick, Gelatt & Vecchi, 1983)
11/2/06 101
Dilemma• In the early stages of search, we want a high
temperature, so that we will explore thespace and find the basins of the globalminimum
• In the later stages we want a lowtemperature, so that we will relax into theglobal minimum and not wander away fromit
• Solution: decrease the temperaturegradually during search
11/2/06 102
Quenching vs. Annealing• Quenching:
– rapid cooling of a hot material– may result in defects & brittleness– local order but global disorder– locally low-energy, globally frustrated
• Annealing:– slow cooling (or alternate heating & cooling)– reaches equilibrium at each temperature– allows global order to emerge– achieves global low-energy state
Artificial Neural Networks 11/2/06 17:30
18
11/2/06 103
Multiple Domains
localcoherence
globalincoherence
11/2/06 104
Moving Domain Boundaries
11/2/06 105
Effect of Moderate Temperature
(fig. from Anderson Intr. Neur. Comp.) 11/2/06 106
Effect of High Temperature
(fig. from Anderson Intr. Neur. Comp.)
11/2/06 107
Effect of Low Temperature
(fig. from Anderson Intr. Neur. Comp.) 11/2/06 108
Annealing Schedule
• Controlled decrease of temperature• Should be sufficiently slow to allow
equilibrium to be reached at eachtemperature
• With sufficiently slow annealing, the globalminimum will be found with probability 1
• Design of schedules is a topic of research
Artificial Neural Networks 11/2/06 17:30
19
11/2/06 109
Demonstration of BoltzmannMachine
& Necker Cube Example
Run ~mclennan/pub/cube/cubedemo
11/2/06 110
Necker Cube
11/2/06 111
Biased Necker Cube
11/2/06 112
Summary
• Non-directed change (random motion)permits escape from local optima andspurious states
• Pseudo-temperature can be controlled toadjust relative degree of exploration andexploitation
top related