Übungen (practicals) 4 exercise sheets, over three weeks, each covering contents parallel to the...

49

Upload: rolf-elwin-stafford

Post on 18-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Übungen (Practicals)  4 exercise sheets, over three weeks, each Covering contents parallel to the lectures Required: C++
Page 2: Übungen (Practicals)  4 exercise sheets, over three weeks, each Covering contents parallel to the lectures Required: C++

Übungen (Practicals)

http://www.physik3.gwdg.de/cns

4 exercise sheets, over three weeks, each Covering contents parallel to the lectures

Required: C++ programming and gnuploti.e., lecture Computational Physics

Submission of programs +10 min presentation and discussion

Details: First session right after the lecturein the PC pool: Room C.00.110

New

Page 3: Übungen (Practicals)  4 exercise sheets, over three weeks, each Covering contents parallel to the lectures Required: C++

Learning and Memory

Learning:

Learning Types/Classes and Learning Rules (Overview)

Conceptualizing about Learning

Math (Rules, Algorithms and Convergence) together with the

Biological Substrate for different learning rules and

Biological- and some other Applications (Pattern Recogn.,

Robotics, etc.)

Memory:

Theories

Biological Substrate

Integrative Models – towards Cognition

Page 4: Übungen (Practicals)  4 exercise sheets, over three weeks, each Covering contents parallel to the lectures Required: C++

Different Types/Classes of Learning

Unsupervised Learning (non-evaluative feedback)• Trial and Error Learning.

• No Error Signal.

• No influence from a Teacher, Correlation evaluation only.

Reinforcement Learning (evaluative feedback)• (Classic. & Instrumental) Conditioning, Reward-based Lng.

• “Good-Bad” Error Signals.

• Teacher defines what is good and what is bad.

Supervised Learning (evaluative error-signal feedback)• Teaching, Coaching, Imitation Learning, Lng. from examples and more.

• Rigorous Error Signals.

• Direct influence from a Teacher/teaching signal.

Page 5: Übungen (Practicals)  4 exercise sheets, over three weeks, each Covering contents parallel to the lectures Required: C++

Machine Learning Classical Conditioning Synaptic Plasticity

Dynamic Prog.(Bellman Eq.)

REINFORCEMENT LEARNING UN-SUPERVISED LEARNINGexample based correlation based

d-Rule

Monte CarloControl

Q-Learning

TD( )often =0

ll

TD(1) TD(0)

Rescorla/Wagner

Neur.TD-Models(“Critic”)

Neur.TD-formalism

DifferentialHebb-Rule

(”fast”)

STDP-Modelsbiophysical & network

EVALUATIVE FEEDBACK (Rewards)

NON-EVALUATIVE FEEDBACK (Correlations)

SARSA

Correlationbased Control

(non-evaluative)

ISO-Learning

ISO-Modelof STDP

Actor/Critictechnical & Basal Gangl.

Eligibility Traces

Hebb-Rule

DifferentialHebb-Rule

(”slow”)

supervised L.

Anticipatory Control of Actions and Prediction of Values Correlation of Signals

=

=

=

Neuronal Reward Systems(Basal Ganglia)

Biophys. of Syn. PlasticityDopamine Glutamate

STDP

LTP(LTD=anti)

ISO-Control

Overview over different methods

Page 6: Übungen (Practicals)  4 exercise sheets, over three weeks, each Covering contents parallel to the lectures Required: C++

Machine Learning Classical Conditioning Synaptic Plasticity

Dynamic Prog.(Bellman Eq.)

REINFORCEMENT LEARNING UN-SUPERVISED LEARNINGexample based correlation based

d-Rule

Monte CarloControl

Q-Learning

TD( )often =0

ll

TD(1) TD(0)

Rescorla/Wagner

Neur.TD-Models(“Critic”)

Neur.TD-formalism

DifferentialHebb-Rule

(”fast”)

STDP-Modelsbiophysical & network

EVALUATIVE FEEDBACK (Rewards)

NON-EVALUATIVE FEEDBACK (Correlations)

SARSA

Correlationbased Control

(non-evaluative)

ISO-Learning

ISO-Modelof STDP

Actor/Critictechnical & Basal Gangl.

Eligibility Traces

Hebb-Rule

DifferentialHebb-Rule

(”slow”)

supervised L.

Anticipatory Control of Actions and Prediction of Values Correlation of Signals

=

=

=

Neuronal Reward Systems(Basal Ganglia)

Biophys. of Syn. PlasticityDopamine Glutamate

STDP

LTP(LTD=anti)

ISO-Control

Overview over different methods

Supervised Learning: Manymore methods exist !

Page 7: Übungen (Practicals)  4 exercise sheets, over three weeks, each Covering contents parallel to the lectures Required: C++

What can neurons compute ?

What can networks compute ?

Neurons can compute ONLY correlations!

Networks can compute anything .

What is the biological Substrate for all learning?

The Synapse/synaptic strength (the connection strength between two neurons.)

The Basics and a quick comparison(before the maths really starts)

Page 8: Übungen (Practicals)  4 exercise sheets, over three weeks, each Covering contents parallel to the lectures Required: C++

The Neuroscience Basics asa Six Slide Crash Course

I forgot to make a backup of my brain.

All what I had learned last term is gone now.

Page 9: Übungen (Practicals)  4 exercise sheets, over three weeks, each Covering contents parallel to the lectures Required: C++

Human Brain

Cortical Pyramidal Neuron

Page 10: Übungen (Practicals)  4 exercise sheets, over three weeks, each Covering contents parallel to the lectures Required: C++

At the dendrite the incomingsignals arrive (incoming currents)

At the soma currentare finally integrated.

At the axon hillock action potentialare generated if the potential crosses the membrane threshold

The axon transmits (transports) theaction potential to distant sites

At the synapses are the outgoingsignals transmitted onto the dendrites of the target neurons

Structure of a Neuron:

Page 11: Übungen (Practicals)  4 exercise sheets, over three weeks, each Covering contents parallel to the lectures Required: C++

Schematic Diagram of a Synapse

Receptor ≈ ChannelTransmitter

Terms to remember !

Page 12: Übungen (Practicals)  4 exercise sheets, over three weeks, each Covering contents parallel to the lectures Required: C++

Ion channels consist of big (protein) molecules which are inserted into to the membrane and connect intra- and extracellular space.

Ion channels:

)()(1

ruheruhe VVgVVR

I mmR restrest

Ion channels consist of big (protein) molecules which are inserted into to the membrane and connect intra- and extracellular space.

Channels act as a restistance against the free flow of ions: Electrical resistor R:

Ion channels consist of big (protein) molecules which are inserted into to the membrane and connect intra- and extracellular space.

Channels act as a restistance against the free flow of ions: Electrical resistor R:

If Vm = Vrest (resting potential) there is no current flow. Electrical and chemical gradient are balanced (with opposite signs).

Ion channels consist of big (protein) molecules which are inserted into to the membrane and connect intra- and extracellular space.

Channels act as a restistance against the free flow of ions: Electrical resistor R:

If Vm = Vrest (resting potential) there is no current flow. Electrical and chemical gradient are balanced (with opposite signs).

Channels are normally ion-selective and will open and close in dependence on the membrane potential (normal case) but also on (other) ions (e.g. NMDA channels).

Ion channels consist of big (protein) molecules which are inserted into to the membrane and connect intra- and extracellular space.

Channels act as a restistance against the free flow of ions: Electrical resistor R:

If Vm = Vrest (resting potential) there is no current flow. Electrical and chemical gradient are balanced (with opposite signs).

Channels are normally ion-selective and will open and close in dependence on the membrane potential (normal case) but also on (other) ions (e.g. NMDA channels).

Channels exists for: K+, Na+, Ca2+, Cl-

Page 13: Übungen (Practicals)  4 exercise sheets, over three weeks, each Covering contents parallel to the lectures Required: C++

What happens at a chemical synapse during signal transmission:

The pre-synaptic action potential depolarises the

axon terminals and Ca2+-channels open.Pre-synapticaction potential

Concentration oftransmitterin the synaptic cleft

Post-synapticaction potential

The pre-synaptic action potential depolarises the

axon terminals and Ca2+-channels open.

Ca2+ enters the pre-synaptic cell by which the

transmitter vesicles are forced to open and release

the transmitter.

The pre-synaptic action potential depolarises the

axon terminals and Ca2+-channels open.

Ca2+ enters the pre-synaptic cell by which the

transmitter vesicles are forced to open and release

the transmitter.

Thereby the concentration of transmitter increases

in the synaptic cleft and transmitter diffuses to the

postsynaptic membrane.

The pre-synaptic action potential depolarises the

axon terminals and Ca2+-channels open.

Ca2+ enters the pre-synaptic cell by which the

transmitter vesicles are forced to open and release

the transmitter.

Thereby the concentration of transmitter increases

in the synaptic cleft and transmitter diffuses to the

postsynaptic membrane.

Transmitter sensitive channels at the postsyaptic

membrane open. Na+ and Ca2+ enter, K+ leaves the

cell. An excitatory postsynaptic current (EPSC) is

thereby generated which leads to an excitatory

postsynaptic potential (EPSP).

Page 14: Übungen (Practicals)  4 exercise sheets, over three weeks, each Covering contents parallel to the lectures Required: C++

Information is stored in a Neural Network by the Strength of its Synaptic Connections

Up to 10000 Synapsesper Neuron

Growth or newgeneration ofcontact points

Contact points(syn. Spines)for otherneurons

Page 15: Übungen (Practicals)  4 exercise sheets, over three weeks, each Covering contents parallel to the lectures Required: C++
Page 16: Übungen (Practicals)  4 exercise sheets, over three weeks, each Covering contents parallel to the lectures Required: C++

Time Scales

Working memory

msec sec min hrs days years

Short-term memory

Long-term memory

Mem

ory

Activity

Short-termplasticity

Long-term plasticity

Structural plasticityPh

ysio

logy

Tetzlaff et al. (2012). Biol. Cybern.

Page 17: Übungen (Practicals)  4 exercise sheets, over three weeks, each Covering contents parallel to the lectures Required: C++

Basic Hebb-Rule: = m ui v m << 1dwi

dt

For Learning: One input, one output

An unsupervised learning rule:

A supervised learning rule (Delta Rule):

No input, No output, one Error Function Derivative,where the error function compares input- with output-examples.

A reinforcement learning rule (TD-learning):

One input, one output, one reward

wit+1 = wi

t + m [rt+1 + gvt+1 - vt] uit

wit+1 = wi

t - m dEt

dwi

Page 18: Übungen (Practicals)  4 exercise sheets, over three weeks, each Covering contents parallel to the lectures Required: C++

…Basic Hebb-Rule:

…correlates inputs with outputs by the…

= m u1 v m << 1dw1

dt

vx1w1

Correlation based (Hebbian) learning…

x

How can correlations be learned?

This rule is temporally symmetrical !

Page 19: Übungen (Practicals)  4 exercise sheets, over three weeks, each Covering contents parallel to the lectures Required: C++

Conventional Hebbian Learning

Symmetrical Weight-change curve

Pre

tPre

Post

tPost

Synaptic change %

Pre

tPre

Post

tPost

The temporal order of input and output does not play any role

= m u1 v dw1

dt

Page 20: Übungen (Practicals)  4 exercise sheets, over three weeks, each Covering contents parallel to the lectures Required: C++

w1

X

x1

vS

Our Standard Notation

Synapse = Amplifier with variable weight w1

Neuron (will sum differentinputs, here only one)

Output

Input

= m u1 x vdw1

dtCorrelation between Input and

OutputHebbianLearning

u1

Page 21: Übungen (Practicals)  4 exercise sheets, over three weeks, each Covering contents parallel to the lectures Required: C++

w

d

1

X

x1

r

vv’

SE

S

Compare to Reinforcement Learning (RL)

Synapse

Neuron

Output

Input

Error Term

Reward Derivative

Trace

Correlation

This is Hebb !

u1

Page 22: Übungen (Practicals)  4 exercise sheets, over three weeks, each Covering contents parallel to the lectures Required: C++

w

d

1

X

x1

r

vv’

SE

STrace

What is this Eligibility Trace E good for ?

Equation for RLSo-called: Temporal Difference (TD) Learning

u1

wit+1 = wi

t + m [rt+1 + gvt+1 - vt] uit

Page 23: Übungen (Practicals)  4 exercise sheets, over three weeks, each Covering contents parallel to the lectures Required: C++

I. Pawlow

Classical Conditioning

Page 24: Übungen (Practicals)  4 exercise sheets, over three weeks, each Covering contents parallel to the lectures Required: C++

Sw0 = 1

w1

Unconditioned Stimulus (Food)

Conditioned Stimulus (Bell)

Response

S

We start by making a single compartment model of a dog !

X

Dw1+

The reductionist approachof a theoretician:

Stimulus Trace E

What is this Eligibility Trace E good for ?

The first stimulus needs to be “remembered” in the system

Page 25: Übungen (Practicals)  4 exercise sheets, over three weeks, each Covering contents parallel to the lectures Required: C++

w

d

1

X

x1

r

vv’

SE

S

TD Learning

Condition for convergence: d=0

Measured at theOutput of the System(Output Control)

u1

dt = rt+1 + gvt+1 - vt

Page 26: Übungen (Practicals)  4 exercise sheets, over three weeks, each Covering contents parallel to the lectures Required: C++

Correlation based learning: No teacher

Reinforcement learning , indirect influence

Reinforcement learning, direct influence

Supervised Learning, Teacher

Programming

Learning Speed Autonomy

Page 27: Übungen (Practicals)  4 exercise sheets, over three weeks, each Covering contents parallel to the lectures Required: C++

Animal

Env.

Animal

Open Loop versus Closed Loop Systems

Page 28: Übungen (Practicals)  4 exercise sheets, over three weeks, each Covering contents parallel to the lectures Required: C++

Output Controlthrough observationof the agent

External Value systems

Input Controlat the agent’sown sensors

True internalValue systems

Evaluative Feedback(Klopf 1988)

Non- Evaluative Feedback

Designer

Agent

Reinforcement

Env.

The Difference between Input and Output Control

Page 29: Übungen (Practicals)  4 exercise sheets, over three weeks, each Covering contents parallel to the lectures Required: C++

Is this a “real” or just an “academic” Problem:Why would we want Input Control ?

Designer

Agent

WrongReinforcement

Output Controlthrough observationof the agent

Input Controlat the agent’sown sensors

Are we observing the right

aspects of the behaviour ?

(The right output variables?)

The Output control paradigm can and does lead to major problems inreinforcement learning, because the wrong behaviour might be reinforced.

Env.

Page 30: Übungen (Practicals)  4 exercise sheets, over three weeks, each Covering contents parallel to the lectures Required: C++

Agent with own Goals(Me !)

Environment(You !)

My perception of yourattentiveness

Speech

Input Control Loop:Allows me to control my lectureand to learn to improve

Prior knowledgeand shared goalshelp!

Reinforces my running around

“Marsian” observerwith other Sensors and/or

other Goals

Bodyheat

A “funny”example

Page 31: Übungen (Practicals)  4 exercise sheets, over three weeks, each Covering contents parallel to the lectures Required: C++

Relevance for Learning:

1) Use output control to get a system that does what YOU want. (engineering system)

2) Use Input control to get an autonomous (biologically motivated system).

Correlation based learning: No teacher

Reinforcement learning , indirect influence

Reinforcement learning, direct influence

Supervised Learning, Teacher

Programming

Learning Speed Autonomy

Other considerations:

Page 32: Übungen (Practicals)  4 exercise sheets, over three weeks, each Covering contents parallel to the lectures Required: C++

• Good ending point

Page 33: Übungen (Practicals)  4 exercise sheets, over three weeks, each Covering contents parallel to the lectures Required: C++

Is this a real or just an academic problem?

ObserverObservable

Quantity

Agent thatbehaves(System)

observes

reinforces

What is the desired(most often occurring)output state ?

Zero !

Some

outp

ut

V

0

Page 34: Übungen (Practicals)  4 exercise sheets, over three weeks, each Covering contents parallel to the lectures Required: C++

Observation of V

Control of Lever

ObservedSystem

Observer/Controller

The Situation

Page 35: Übungen (Practicals)  4 exercise sheets, over three weeks, each Covering contents parallel to the lectures Required: C++

Experiment:

V

0

Lever

Here are some valid solutionsfor a V=0 reinforcement.How should the lever be moved?

Assume you have one lever by which you can try to drive V towards zero, whenever it suddenly deviates

Page 36: Übungen (Practicals)  4 exercise sheets, over three weeks, each Covering contents parallel to the lectures Required: C++

Obviously V=0 can be easily obtained when the lever follows V!

?

Page 37: Übungen (Practicals)  4 exercise sheets, over three weeks, each Covering contents parallel to the lectures Required: C++

Motors& Wheels

Sensor RangeSensors

SR = SL = 0AR

AL

What the Agent wanted to learn was to approach theyellow food blob and eat it.

SensibleStimulus

The System: A Braitenberg Vehicle V. Braitenberg, (1984), “Vehicles: Experiments in synthetic Psychology”

This

is th

e de

sire

d

solu

tion

Motor Signals A

SR

SL

Sensor Signals S

1:1 conn.

Output signal V=SR - SLAR - AL =Lever signal

Page 38: Übungen (Practicals)  4 exercise sheets, over three weeks, each Covering contents parallel to the lectures Required: C++

SR = SL = 0

What you have reinforced:

Leaving the food blob totally out of sight also gets V=0(only the poor creature never eats and dies…….)

And………….. Things will get worse…….

The observable quantity V was not appropriate !

One should have observed AR, AL (but could not).

0

Page 39: Übungen (Practicals)  4 exercise sheets, over three weeks, each Covering contents parallel to the lectures Required: C++
Page 40: Übungen (Practicals)  4 exercise sheets, over three weeks, each Covering contents parallel to the lectures Required: C++

Observer Observer knows: “1 )This is a Down-Neuron”

(for Eye-Muscles)

Synapses=StatesWeights =Values

>1=reward

Enough to triggeran action !

Motor Neuron (Actor)

Assumptions 1,2lead to

Observerinduced

reinforcement

Observer knows: “2) There is evidence that the spatial ordering ofsynapses at a dendrite leads to direction selectivity andthe observer has measured where the synapses are on the dendrite”

Assumptions 1 and 2 correspond to the Observer’s knowledge of this system

This is OutputControl !

Page 41: Übungen (Practicals)  4 exercise sheets, over three weeks, each Covering contents parallel to the lectures Required: C++

Observer “This is a Down-Neuron”(for Eye-Muscles)

Synapses=StatesWeights =Values

>1=reward

Environmenthere also “Optics”

Motor Neuron (Actor)

Ret

inal

rec

epti

ve f

ield

s

This Observer did lack the knowledge that the optics of the eye inverts the image

True virtualimage motion

Really this synapse shouldhave been reinforced

Page 42: Übungen (Practicals)  4 exercise sheets, over three weeks, each Covering contents parallel to the lectures Required: C++

The observable quantities were appropriate but the Observer had a lack of knowledge about the inner signal processing in this system.

Observer

Synapses=StatesWeights =Values

Environmenthere also “Optics”

Motor Neuron (Actor)R

etin

al r

ecep

tive

fie

lds

True virtualimage motion

A first order fallacy:

A second order fallacy:

0

The observable quantity V was not appropriate ! One should have observed AR, AL (but could not).

Page 43: Übungen (Practicals)  4 exercise sheets, over three weeks, each Covering contents parallel to the lectures Required: C++

More realistically !

• Think of an engineer having to control the behavior and learning of a complex Mars-Rover which has many (1000?) simultaneous signals. – How would you know which signal configuration is at the moment

beneficial for behavior and learning.

OUTPUT CONTROL WILL NOT WORK

• Ultimately only the Rover can know this.– But how would it maintain stability to begin with (not the be doomed

from starters)

Page 44: Übungen (Practicals)  4 exercise sheets, over three weeks, each Covering contents parallel to the lectures Required: C++

Since observers cannot have complete knowledge of the observed system we find that:

Output Control is fundamentally problematic.

A complex robot-world model required deep understanding on the side of the designer to define the appropriate reinforcement function(s).

This leads to a large degree of interference.

As a consequence the robot has then the world model of the designer (but not its own) – A slave not an autonomous agent.

Page 45: Übungen (Practicals)  4 exercise sheets, over three weeks, each Covering contents parallel to the lectures Required: C++

= * *Dr I II S > 0I

Environmenthere also “Optics”

Motor Neuron (Actor)

Ret

inal

rec

epti

ve f

ield

s

Spont.DriveIII

I

II

S

Input Controlwill always work!

This isInput Control

= * *Dr III II S = 0III

Input Control

Page 46: Übungen (Practicals)  4 exercise sheets, over three weeks, each Covering contents parallel to the lectures Required: C++

The Chicken-Egg Problem Type I

Which came first: Chicken or Egg?

Page 47: Übungen (Practicals)  4 exercise sheets, over three weeks, each Covering contents parallel to the lectures Required: C++

Control of my Input(I, chook, want to feel an egg under my butt):

I, chook, would like to sit onthis egg as long as requiredto hatch .

Control of its Output:

I, farmer, would like to getas many eggs as possible and takethem away from the chook.

AutonomyControl from inside

ServituteControl from Outside

A fundamental Conflict

Here a Chicken-Egg Problem of Type II

Page 48: Übungen (Practicals)  4 exercise sheets, over three weeks, each Covering contents parallel to the lectures Required: C++

But that’s simple, isn’t it: Teaching will do it (supervised learning) !

You tell me, this is good and that is bad…………..

Supervised Learning

Reinforcement Learning – Learning from experience while acting in the world

I tell myself, this is good and that is bad………….

Requires a Value-System in the Animal

(Dopaminergic System, Schultz 1998)

Still: How do we get this in the first place ?

a) Bootstrapping Problem: Evolution does not teach (evaluate).

b) Viewpoint Problem: Those are the values of the teacher and not of the creature.

c) Complexity Problem: SL requires already complex understanding.

Value Systems (in the brain)

Page 49: Übungen (Practicals)  4 exercise sheets, over three weeks, each Covering contents parallel to the lectures Required: C++

Animal

Evolution

World

Fully situated

but takes long

Robot

Designer

WorldBadly situated

but can be achieved quickly

Designer’s

World-view

ValuesValues

The Problem:

How to bootstrap a Value System ?

Evolve it ! Design it !