Übungen (practicals) - third institute of physics · covering contents parallel to the lectures...
TRANSCRIPT
Übungen (Practicals) http://www.physik3.gwdg.de/cns
4 exercise sheets, over three weeks, eachCovering contents parallel to the lectures
Required: C++ programming and gnuploti.e., lecture Computational Physics
Submission of programs +10 min presentation and discussion
Details: First session right after the lecturein the PC pool: Room C.00.110
Learning and Memory
Learning:Learning Types/Classes and Learning Rules (Overview)
Conceptualizing about Learning
Math (Rules, Algorithms and Convergence) together with the
Biological Substrate for different learning rules and
Biological- and some other Applications (Pattern Recogn.,
Robotics, etc.)
Memory:Theories
Biological Substrate
Integrative Models – towards Cognition
Different Types/Classes of Learning
Unsupervised Learning (non-evaluative feedback)• Trial and Error Learning.
• No Error Signal.
• No influence from a Teacher, Correlation evaluation only.
Reinforcement Learning (evaluative feedback)• (Classic. & Instrumental) Conditioning, Reward-based Lng.
• “Good-Bad” Error Signals.
• Teacher defines what is good and what is bad.
Supervised Learning (evaluative error-signal feedback)• Teaching, Coaching, Imitation Learning, Lng. from examples and more.
• Rigorous Error Signals.
• Direct influence from a Teacher/teaching signal.
M achine Learn ing C lass ica l C ondition ing Synaptic P las tic ity
D ynam ic Prog .(Be llm an Eq .)
R EIN FO R C EM EN T LEAR N IN G U N -SU PERVISED LEAR N IN Ge x a m p le b a s e d c o rre la tio n b a s e d
δ -R u le
M onte C arloC on tro l
Q -Learn ing
TD ( )o ften = 0
λλ
TD (1) TD (0 )
R escorla /W agner
N e u r.T D -M o d e ls(“C rit ic ”)
N e u r.T D -fo rm a lism
D iffe ren tia lH ebb-R u le
(”fas t”)
STD P-M ode lsb io p h y s ic a l & n e tw o rk
EVALU ATIVE FEED BAC K (R ew ards )
N O N -EVALU ATIVE FEED BAC K (C orre la tions )
S A R S AC o rre la tio n
b a se d C o n tro l(non -eva lua t ive )
IS O -L e a rn in g
IS O -M o d e lo f S T D P
A cto r /C r iticte c h n ic a l & B a s a l G a n g l.
E lig ibility Tra ce s
H ebb-R u le
D iffe ren tia lH ebb-R u le
(”s low ”)
supe rv ised L .
A n tic ip a to ry C o n tro l o f A c tio n s a n d P re d ic tio n o f Va lu e s C o r re la tio n o f S ig n a ls
=
=
=
N eurona l R ew ard Sys tem s(Basa l G ang lia )
B iophys . o f Syn . P las tic ityD o p a m in e G lu ta m a te
STD P
LTP(LT D = a n ti)
IS O -C on tro l
Overview over different methods
M achine Learn ing C lass ica l C ondition ing Synaptic P las tic ity
D ynam ic Prog .(Be llm an Eq .)
R EIN FO R C EM EN T LEAR N IN G U N -SU PERVISED LEAR N IN Ge x a m p le b a s e d c o rre la tio n b a s e d
δ -R u le
M onte C arloC on tro l
Q -Learn ing
TD ( )o ften = 0
λλ
TD (1) TD (0 )
R escorla /W agner
N e u r.T D -M o d e ls(“C rit ic ”)
N e u r.T D -fo rm a lism
D iffe ren tia lH ebb-R u le
(”fas t”)
STD P-M ode lsb io p h y s ic a l & n e tw o rk
EVALU ATIVE FEED BAC K (R ew ards )
N O N -EVALU ATIVE FEED BAC K (C orre la tions )
S A R S AC o rre la tio n
b a se d C o n tro l(non -eva lua t ive )
IS O -L e a rn in g
IS O -M o d e lo f S T D P
A cto r /C r iticte c h n ic a l & B a s a l G a n g l.
E lig ibility Tra ce s
H ebb-R u le
D iffe ren tia lH ebb-R u le
(”s low ”)
supe rv ised L .
A n tic ip a to ry C o n tro l o f A c tio n s a n d P re d ic tio n o f Va lu e s C o r re la tio n o f S ig n a ls
=
=
=
N eurona l R ew ard Sys tem s(Basa l G ang lia )
B iophys . o f Syn . P las tic ityD o p a m in e G lu ta m a te
STD P
LTP(LT D = a n ti)
IS O -C on tro l
Overview over different methods
Supervised Learning: Manymore methods exist !
What can neurons compute ?
What can networks compute ?
Neurons can compute ONLY correlations!
Networks can compute anything .
What is the biological Substrate for all learning?
The Synapse/synaptic strength (the connection strength between two neurons.)
The Basics and a quick comparison(before the maths really starts)
The Neuroscience Basics asa Six Slide Crash Course
I forgot to make a backup of my brain.
All what I had learned last term is gone now.
At the dendrite the incomingsignals arrive (incoming currents)
At the soma currentare finally integrated.
At the axon hillock action potentialare generated if the potential crosses the membrane threshold
The axon transmits (transports) theaction potential to distant sites
At the synapses are the outgoingsignals transmitted onto the dendrites of the target neurons
Structure of a Neuron:
Ion channels consist of big (protein) molecules which are inserted into to the membrane and connect intra- and extracellular space.
Ion channels:
)()(1ruheruhe VVgVV
RI mmR −=−= restrest
Ion channels consist of big (protein) molecules which are inserted into to the membrane and connect intra- and extracellular space.
Channels act as a restistance against the free flow of ions: Electrical resistor R:
Ion channels consist of big (protein) molecules which are inserted into to the membrane and connect intra- and extracellular space.
Channels act as a restistance against the free flow of ions: Electrical resistor R:
If Vm = Vrest (resting potential) there is no current flow. Electrical and chemical gradient are balanced (with opposite signs).
Ion channels consist of big (protein) molecules which are inserted into to the membrane and connect intra- and extracellular space.
Channels act as a restistance against the free flow of ions: Electrical resistor R:
If Vm = Vrest (resting potential) there is no current flow. Electrical and chemical gradient are balanced (with opposite signs).
Channels are normally ion-selective and will open and close in dependence on the membrane potential (normal case) but also on (other) ions (e.g. NMDA channels).
Ion channels consist of big (protein) molecules which are inserted into to the membrane and connect intra- and extracellular space.
Channels act as a restistance against the free flow of ions: Electrical resistor R:
If Vm = Vrest (resting potential) there is no current flow. Electrical and chemical gradient are balanced (with opposite signs).
Channels are normally ion-selective and will open and close in dependence on the membrane potential (normal case) but also on (other) ions (e.g. NMDA channels).
Channels exists for: K+, Na+, Ca2+, Cl-
What happens at a chemical synapse during signal transmission:
The pre-synaptic action potential depolarises the axon terminals and Ca2+-channels open.Pre-synaptic
action potential
Concentration oftransmitterin the synaptic cleft
Post-synapticaction potential
The pre-synaptic action potential depolarises the axon terminals and Ca2+-channels open.
Ca2+ enters the pre-synaptic cell by which the transmitter vesicles are forced to open and release the transmitter.
The pre-synaptic action potential depolarises the axon terminals and Ca2+-channels open.
Ca2+ enters the pre-synaptic cell by which the transmitter vesicles are forced to open and release the transmitter.
Thereby the concentration of transmitter increases in the synaptic cleft and transmitter diffuses to the postsynaptic membrane.
The pre-synaptic action potential depolarises the axon terminals and Ca2+-channels open.
Ca2+ enters the pre-synaptic cell by which the transmitter vesicles are forced to open and release the transmitter.
Thereby the concentration of transmitter increases in the synaptic cleft and transmitter diffuses to the postsynaptic membrane.
Transmitter sensitive channels at the postsyaptic membrane open. Na+ and Ca2+ enter, K+ leaves the cell. An excitatory postsynaptic current (EPSC) is thereby generated which leads to an excitatory postsynaptic potential (EPSP).
Information is stored in a Neural Network by the Strengthof its Synaptic Connections
Up to 10000 Synapsesper Neuron
Growth or newgeneration ofcontact points
Contact points(syn. Spines)for otherneurons
Time Scales
Working memory
msec sec min hrs days years
Short-term memory
Long-term memory
Mem
ory
Activity
Short-termplasticity
Long-term plasticity
Structural plasticityPh
ysio
logy
Tetzlaff et al. (2012). Biol. Cybern.
Basic Hebb-Rule: = µ ui v µ << 1dωidt
For Learning: One input, one output
An unsupervised learning rule:
A supervised learning rule (Delta Rule):
No input, No output, one Error Function Derivative,where the error function compares input- with output-examples.
A reinforcement learning rule (TD-learning):
One input, one output, one reward
ωit+1 = ωi
t + µ [rt+1 + γvt+1 - vt] uit
ωit+1 = ωi
t - µ dEt
dωi
…Basic Hebb-Rule:
…correlates inputs with outputs by the…
= µ u1 v µ << 1dω1
dt
vx1ω1
Correlation based (Hebbian) learning…
x
How can correlations be learned?
This rule is temporally symmetrical !
Conventional Hebbian Learning
Symmetrical Weight-change curve
Pre
tPre
Post
tPost
Synaptic change %
Pre
tPre
Post
tPost
The temporal order of input and output does not play any role
= µ u1 v dω1
dt
ω 1
X
x1vΣ
Our Standard Notation
Synapse = Amplifier with variable weight ω1
Neuron (will sum differentinputs, here only one)
Output
Input
= µ u1 x vdω1
dt Correlation between Input andOutputHebbian
Learning
u1
ω
δ
1
X
x1
r
vv ’Σ
E
Σ
Compare to Reinforcement Learning (RL)
SynapseNeuron
Output
Input
Error Term
Reward Derivative
Trace
Correlation
This is Hebb !
u1
ω
δ
1
X
x1
r
vv ’Σ
E
ΣTrace
What is this Eligibility Trace E good for ?
Equation for RLSo-called: Temporal Difference (TD) Learning
u1
ωit+1 = ωi
t + µ [rt+1 + γvt+1 - vt] uit
Σω0 = 1
ω1
Unconditioned Stimulus (Food)
Conditioned Stimulus (Bell)
Response
Σ
We start by making a single compartment model of a dog !
X
∆ω1+
The reductionist approachof a theoretician:
Stimulus Trace E
What is this Eligibility Trace E good for ?
The first stimulus needs to be “remembered” in the system
ω
δ
1
X
x1
r
vv ’Σ
E
Σ
TD LearningCondition for convergence: δ=0
Measured at theOutput of the System(Output Control)
u1
δt = rt+1 + γvt+1 - vt
Correlation based learning: No teacher
Reinforcement learning , indirect influence
Reinforcement learning, direct influence
Supervised Learning, Teacher
Programming
Learning Speed Autonomy
Output Controlthrough observationof the agent
External Value systems
Input Controlat the agent’sown sensors
True internalValue systems
Evaluative Feedback(Klopf 1988)
Non- Evaluative Feedback
Designer
Agent
Reinforcement
Env.
The Difference between Input and Output Control
Is this a “real” or just an “academic” Problem:Why would we want Input Control ?
Designer
Agent
WrongReinforcement
Output Controlthrough observationof the agent
Input Controlat the agent’sown sensors
The Output control paradigm can and does lead to major problems inreinforcement learning, because the wrong behaviour might be reinforced.
Env.
Agent with own Goals(Me !)
Environment(You !)
My perception of yourattentiveness
Speech
Input Control Loop:Allows me to control my lectureand to learn to improve
Prior knowledgeand shared goalshelp!
Reinforces my running around
“Marsian” observerwith other Sensors and/or
other Goals
Bodyheat
A “funny”example
Relevance for Learning:1) Use output control to get a system that does what YOU want. (engineering
system)
2) Use Input control to get an autonomous (biologically motivated system).
Correlation based learning: No teacher
Reinforcement learning , indirect influence
Reinforcement learning, direct influence
Supervised Learning, Teacher
Programming
Learning Speed AutonomyOther considerations:
Is this a real or just an academic problem?
Observer ObservableQuantity
Agent thatbehaves(System)
observes
reinforces
What is the desired(most often occurring)output state ?
Zero !
V
0
Experiment:
V
0
Lever
Here are some valid solutionsfor a V=0 reinforcement.How should the lever be moved?
Assume you have one lever by which you can try to drive V towards zero, whenever it suddenly deviates
Motors& Wheels
Sensor RangeSensors
SR = SL = 0AR
AL
What the Agent wanted to learn was to approach theyellow food blob and eat it.
SensibleStimulus
The System: A Braitenberg VehicleV. Braitenberg, (1984), “Vehicles: Experiments in synthetic Psychology”
Motor Signals A
SR
SL
Sensor Signals S
1:1 conn.
Output signal V=SR - SL AR -AL =Lever signal
SR = SL = 0
What you have reinforced:
Leaving the food blob totally out of sight also gets V=0(only the poor creature never eats and dies…….)
And………….. Things will get worse…….
The observable quantity V was not appropriate !One should have observed AR, AL (but could not).
0
Observer Observer knows:“1 )This is a Down-Neuron”
(for Eye-Muscles)
Synapses=StatesWeights =Values
>1=reward
Enough to triggeran action !
Motor Neuron (Actor)
Assumptions 1,2lead to
Observerinduced
reinforcement
Observer knows: “2) There is evidence that the spatial ordering ofsynapses at a dendrite leads to direction selectivity andthe observer has measured where the synapses are on the dendrite”
Assumptions 1 and 2 correspond to the Observer’s knowledge of this system
This is OutputControl !
Observer “This is a Down-Neuron”(for Eye-Muscles)
Synapses=StatesWeights =Values
>1=reward
Environmenthere also “Optics”
Motor Neuron (Actor)
Ret
inal
rece
ptiv
e fie
lds
This Observer did lack the knowledge that the optics of the eye inverts the image
True virtualimage motion
Really this synapse shouldhave been reinforced
The observable quantities were appropriate but the Observer had a lack of knowledge about the inner signal processing in this system.
Observer
Synapses=StatesWeights =Values
Environmenthere also “Optics”
Motor Neuron (Actor)R
etin
al re
cept
ive
field
s
True virtualimage motion
A first order fallacy:
A second order fallacy:
0
The observable quantity V was not appropriate !One should have observed AR, AL (but could not).
More realistically !
• Think of an engineer having to control the behavior and learning of a complex Mars-Rover which has many (1000?) simultaneous signals. – How would you know which signal configuration is at the moment
beneficial for behavior and learning. OUTPUT CONTROL WILL NOT WORK
• Ultimately only the Rover can know this.– But how would it maintain stability to begin with (not the be doomed
from starters)
Since observers cannot have complete knowledge of the observed system we find that:
Output Control is fundamentally problematic.
A complex robot-world model required deep understanding on the side of the designer to define the appropriate reinforcement function(s).
This leads to a large degree of interference.
As a consequence the robot has then the world model of the designer (but not its own) – A slave not an autonomous agent.
∆ρ = Ι ∗ ΙΙ ∗S > 0I
Environmenthere also “Optics”
Motor Neuron (Actor)
Ret
inal
rece
ptiv
e fie
lds
Spont.DriveIII
I
II
S
Input Controlwill always work!
This isInput Control
∆ρ = ΙΙΙ ∗ ΙΙ ∗S = 0III
Input Control
Control of my Input(I, chook, want to feel an egg under my butt):
I, chook, would like to sit onthis egg as long as requiredto hatch .
Control of its Output:
I, farmer, would like to getas many eggs as possible and takethem away from the chook.
AutonomyControl from inside
ServituteControl from Outside
A fundamental Conflict
Here a Chicken-Egg Problem of Type II
But that’s simple, isn’t it: Teaching will do it (supervised learning) !You tell me, this is good and that is bad…………..
Supervised Learning
Reinforcement Learning – Learning from experience while acting in the world
I tell myself, this is good and that is bad………….
Requires a Value-System in the Animal(Dopaminergic System, Schultz 1998)
Still: How do we get this in the first place ?
a) Bootstrapping Problem: Evolution does not teach (evaluate).b) Viewpoint Problem: Those are the values of the teacher and not of the creature.c) Complexity Problem: SL requires already complex understanding.
Value Systems (in the brain)