statistical learning and optimal control: a framework for biological learning and motor control
DESCRIPTION
Statistical learning and optimal control: A framework for biological learning and motor control Lecture 3: Introduction to optimal control Reza Shadmehr Johns Hopkins School of Medicine. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Statistical learning and optimal control: A framework for biological learning and motor control](https://reader030.vdocuments.site/reader030/viewer/2022020117/56812c0e550346895d907d33/html5/thumbnails/1.jpg)
Statistical learning and optimal control:
A framework for biological learning and motor control
Lecture 3: Introduction to optimal control
Reza Shadmehr
Johns Hopkins School of Medicine
![Page 2: Statistical learning and optimal control: A framework for biological learning and motor control](https://reader030.vdocuments.site/reader030/viewer/2022020117/56812c0e550346895d907d33/html5/thumbnails/2.jpg)
*w *w *w
y y y
x x xEstimation: Given observations x and y, estimate the hidden state w. Your estimates have no bearing on your observations.Example: classical conditioning. The actions of the learner has no effect on the stimuli.
Control: figure out the u that you need to give so that your observations y behave as you want them to.Example: operant conditioning, where the learner’s actions affect whether it gets rewarded or not.
r r
(0)x
(1)y
(0)u
(1)x ( )px
( 1)pu
( )py
![Page 3: Statistical learning and optimal control: A framework for biological learning and motor control](https://reader030.vdocuments.site/reader030/viewer/2022020117/56812c0e550346895d907d33/html5/thumbnails/3.jpg)
The linear quadratic tracking problem
( )
( ) ( )
( 1) ( ) ( )
k
k k
k k k
B
A C
r
y x
x x u
We are trying to track a reference trajectory r(k)
We observe y(k), which is related to x(k)
We generate command u(k), which causes a change in x(k)
We wish to find the control sequence u(0), u(1), …, u(p-1) such that we minimize the cost function,
Given the constraint that:
1
( 1) ( 1) ( 1) ( 1) ( 1) ( ) ( ) ( )
0
p Tk k k k k k T k k
k
J T L
y r y r u u
( 1) ( ) ( )
( 1) ( 1)
k k k
k k
A C
B
x x u
y x
qx1
qx1
mx1
nx1
control costtracking cost
![Page 4: Statistical learning and optimal control: A framework for biological learning and motor control](https://reader030.vdocuments.site/reader030/viewer/2022020117/56812c0e550346895d907d33/html5/thumbnails/4.jpg)
(1) (0) (0)
(2) (1) (1) 2 (0) (0) (1)
(3) (2) (2) 3 (0) 2 (0) (1) (2)
1( ) (0) 1 ( )
0
kk k k j j
j
A C
A C A AC C
A C A A C AC C
A A C
x x u
x x u x u u
x x u x u u u
x x u
(1) (0)
(2) (2)
( ) ( 1)
h h
p p
x u
x ux u
x u
( 1) ( ) ( )k k kA C x x uSuppose we have a linear dynamical system:We have the history of inputs u(k), where k=0…p-1. We want to write the history of state x(k).
mx1 nx1mxnmxm
22 2
3 2 3 2
0 0 0 0 0 00 00 0 0 0
0 0 0 0 0
0
0 0p
I CA CA I AC CA C
E F FA A I A C AC C
A A A I A C A C AC CCA
(p.m)x(p.n)
(p.m)x(p.m)
(0)h hE F x x u
(p.n)x1(p.m)x1
(p.m)xm
![Page 5: Statistical learning and optimal control: A framework for biological learning and motor control](https://reader030.vdocuments.site/reader030/viewer/2022020117/56812c0e550346895d907d33/html5/thumbnails/5.jpg)
(0)0
h h
T Th h h h h h
T Th h h h h h
h h
G
J T L
J G T G L
E F
y x
y r y r u u
x r x r u u
x x u
control costtracking cost
(1) (1)(1) (1)
(2) (2)
( ) ( )( ) ( )
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0
h hp p
p p
T L
T LT L
T L
r y
r y
r y
0 0 0
0 0 0
0 0 0
0 0
B
BG
B
constraint
Total cost
![Page 6: Statistical learning and optimal control: A framework for biological learning and motor control](https://reader030.vdocuments.site/reader030/viewer/2022020117/56812c0e550346895d907d33/html5/thumbnails/6.jpg)
Constraint minimization with Lagrange multipliers: Example 1
Suppose we want to find the point (xs,ys) along the line y=mx+b that is closest to the point (xo,yo).
x
y
1/ 22 200 0
0
2 2
0 0
,
distance
,
0g x y
xxx x y y
yy
J x y x x y y
y mx b
cost
constraint
y mx b
x
y0y mx b
The points along each line are of equal cost
0 0,x y
We want to find point (xs,ys) that belongs to the line, and among the points that belong to the line, gives us the smallest cost.
,s sx y
![Page 7: Statistical learning and optimal control: A framework for biological learning and motor control](https://reader030.vdocuments.site/reader030/viewer/2022020117/56812c0e550346895d907d33/html5/thumbnails/7.jpg)
x
y , 0g x y
1
dgmdx
dg
dy
x
y
,J x y c
0
0
2
2
dJx xdx
dJ y ydy
x
y
The point where the line meets the cost contour is where the vector normal to the constraint and the vector normal to the cost are in the same direction.
Vector normal to the constraint
Vector normal to the cost
The point that we are looking for satisfies the condition:
dJ dg
dx dxdJ dg
dy dy
Lagrange multiplier
![Page 8: Statistical learning and optimal control: A framework for biological learning and motor control](https://reader030.vdocuments.site/reader030/viewer/2022020117/56812c0e550346895d907d33/html5/thumbnails/8.jpg)
0
0
2
2
dJx xdx
dJ y ydy
1
dgmdx
dg
dy
0
0
2
2 1
0
x x m
y y
y mx b
This is 2 equations with 3 unknowns.
Here is our 3rd equation.
120 0
0
1
2
s
s s
s
x m my x mb
y mx b
y y
![Page 9: Statistical learning and optimal control: A framework for biological learning and motor control](https://reader030.vdocuments.site/reader030/viewer/2022020117/56812c0e550346895d907d33/html5/thumbnails/9.jpg)
Suppose the milkmaid wants to get to the cow so that she travels the shortest distance possible, given the constraint that she first washes her milk pan in the river. So we want to find the shortest route that includes a line from the milkmaid to the river edge, and a line from the river edge to the cow. Find the point P that minimizes the following:
x
y
milkmaid
cow
Example from: Steuard Jensen
,
, 0
m c
m c
x xx xf x y
y yy y
g x y
An ellipse can be defined as the set of points P for which the total distance from one focus to P and then to the other focus is constant.
If we keep points M and C as the foci of this ellipse, then as soon as we have an ellipse that touches the river edge, we have found the point P that is our solution.
Note that at point P, the normal vector to g(x) and the normal vector to f(x) are in the same direction.
cost
constraint
, ,
, ,
df x y dg x y
dx dxdf x y dg x y
dy dy
Constraint minimization with Lagrange multipliers: Example 2
![Page 10: Statistical learning and optimal control: A framework for biological learning and motor control](https://reader030.vdocuments.site/reader030/viewer/2022020117/56812c0e550346895d907d33/html5/thumbnails/10.jpg)
Constraint minimization with Lagrange multipliers
A scalar constraint
In order to minimize the scalar function:
Subject to scalar constraint:
We form an augmented cost:
J x
0g x
aJ J g x x x
Note that when we find the x that satisfies the constraint g(x), g(x) will be zero and so we have not changed our cost function.
0adJ dJ dg
d d d
x x xTo minimize the augmented cost, we have:
So, to find the x that minimizes the cost subject to the constraint, we find the (x, lambda) that satisfies:
This should look familiar from last two examples.
0adJg
d x
0
0
a
a
dJ
ddJ
d
x
![Page 11: Statistical learning and optimal control: A framework for biological learning and motor control](https://reader030.vdocuments.site/reader030/viewer/2022020117/56812c0e550346895d907d33/html5/thumbnails/11.jpg)
Constraint minimization with Lagrange multipliers
A “vector” constraint
In order to minimize the scalar function:
Subject to constraint:
We form an augmented cost with two Lagrange multipliers:
J x
1
2
0
0
g
g
xg x
x
1
2
TaJ J
x x λ g x
λ
To minimize the augmented cost, we have:
So, to find the x that minimizes the cost subject to the constraint, we find the (x, lambda1, lambda2) that satisfies:
We have as many multipliers as we have constraints.
1
2
0
0
0
a
a
a
dJ
ddJ
d
dJ
d
x
11
22
0
0
0
a
a
a
dJ dJ d
d d ddJ
gd
dJg
d
gλ
x x x
x
x
![Page 12: Statistical learning and optimal control: A framework for biological learning and motor control](https://reader030.vdocuments.site/reader030/viewer/2022020117/56812c0e550346895d907d33/html5/thumbnails/12.jpg)
(0)
,
,
T Th h h h h h h h
h h h h
J G T G L
g E F
x u x r x r u u
x u x x u 0
(0),T T T
a h h h h h h h h h hJ G T G L E F x u x r x r u u λ x x u
Cost: a scalar
constraint: a vector
2 2
2
,
T Tah h
h
T Tah
h
ah h
dJG TG G T
d
dJL F
d
dJg
d
x r λ 0x
u λ 0u
x u 0λ
Eq. 1
Eq. 2
Solve for lambda in Eq. 1, and then plug it into Eq. 2.
(0)
(0)
(0)
1 (0)
2 2
2 2
T Th h h h
T T T T Th h h
T T T T T Th h h
T T T T T Th h
G T G G T GE GF
L F F G T GE GF
L F G TGF F G T GE
L F G TGF F G T GE
λ r x r x u
u λ r x u
u u r x
u r x
T TdA A A
d x x x
x
![Page 13: Statistical learning and optimal control: A framework for biological learning and motor control](https://reader030.vdocuments.site/reader030/viewer/2022020117/56812c0e550346895d907d33/html5/thumbnails/13.jpg)
2
k
2
kb
x
m 0xu
0
2
2
kx
kx x
bx
u
Force in the bottom spring
Force in the viscous element
Force in the top spring
Force in the motor command
0
0
0
2 2
2
2
k kmx x bx u x x
kkx x bx u
xk x bx u
Let us construct a simple model of the eye’s dynamics and produce a saccade using optimal control
k b
mu
If we re-define x so that we measure it from xo/2, then the equivalent system is shown on right, where the equilibrium point of the spring is at x=0.
x
mx kx bx u
![Page 14: Statistical learning and optimal control: A framework for biological learning and motor control](https://reader030.vdocuments.site/reader030/viewer/2022020117/56812c0e550346895d907d33/html5/thumbnails/14.jpg)
1 2
1 1
2 2
1
0 1 0 00
10
1 0c c
mx kx bx u
k bx x x u
m m m
x x x x
x xk b
x x um m m
A C
y
x x u
x
System dynamics in continuous form
Our observation
Goal: find the motor commands that move the mass (the eye) to a certain location by a certain time while minimizing a cost the depends on endpoint accuracy and motor commands.
First step: re-formulate the system dynamics from continuous to discrete time.
Second step: solve the optimum control problem.
![Page 15: Statistical learning and optimal control: A framework for biological learning and motor control](https://reader030.vdocuments.site/reader030/viewer/2022020117/56812c0e550346895d907d33/html5/thumbnails/15.jpg)
Relating discrete and continuous representation of a linear system(approximate solution)
1
c cc
c c
k k kd d
d k k kd d
t A t C tG
t B t D t
A CG
B D
x x u
y x u
x x u
y x u
Simple (but approximate method) is to use Euler’s approximation:
c c
c c
d c
d c
d c
d c
t t tt
tt t t t t
A t C t t t
I A t t C t t
A I A t
C C t
B B
D D
x xx
x x x
x u x
x u
Continuous system
Discrete system
![Page 16: Statistical learning and optimal control: A framework for biological learning and motor control](https://reader030.vdocuments.site/reader030/viewer/2022020117/56812c0e550346895d907d33/html5/thumbnails/16.jpg)
Solution of continuous LTI state equations (scalar condition)
x t ax tSuppose that our state is a scalar variable and the state update equation is of the form:
0 expx t x atThe solution will have the exponential form:
0 0
0
0
0
exp exp exp
exp exp
exp exp
exp 0 exp
exp 0 exp exp
exp 0 exp
x t ax t cu t
at x t at ax t at cu t
dat x t at cu t
dt
at x t at cu t dt
a x x at cu t dt
x a x a at cu t dt
x a x a t cu t dt
Suppose that our state is a scalar variable and the state update depends on an external input u(t):
![Page 17: Statistical learning and optimal control: A framework for biological learning and motor control](https://reader030.vdocuments.site/reader030/viewer/2022020117/56812c0e550346895d907d33/html5/thumbnails/17.jpg)
Matrix exponential
t A tx xSuppose that our state is a vector variable:
0 expt Atx xWe can imagine that the solution will have a “matrix exponential” form:
For any square matrix A, the matrix exponential exp(A) is a square matrix function. We can compute it using Taylor series expansion.
In Matlab, exp(A) is computed as expm(A).
In Mathematica, use MatrixExp[A].
2 2
20 0 0
22
0
02 !
exp2 !
exp!
n n
nt t t
nn
nn
n
df d f t d f tf t f t
dt dt dt n
t tAt I At A A
n
tAt A
n
![Page 18: Statistical learning and optimal control: A framework for biological learning and motor control](https://reader030.vdocuments.site/reader030/viewer/2022020117/56812c0e550346895d907d33/html5/thumbnails/18.jpg)
Some properties of the matrix exponential
1 2 1 2
exp(0 )
exp exp exp
exp exp exp
t I
A t t At At
dAt A At At A
dt
Using Taylor series expansion, one can show the following properties of the matrix exponential:
Other properties of the matrix exponential:
exp( )exp( )
exp exp exp only if 0
A A I
A B A B AB BA
![Page 19: Statistical learning and optimal control: A framework for biological learning and motor control](https://reader030.vdocuments.site/reader030/viewer/2022020117/56812c0e550346895d907d33/html5/thumbnails/19.jpg)
Solution of continuous LTI state equations (vector condition)
0 0
0
0
0
exp exp exp
exp exp
exp exp
exp 0 exp
exp 0 exp exp
exp 0 exp
t A t C t
At t At A t At C t
dAt t At C t
dt
At t At C t dt
A At C t dt
A A At C t dt
A A t C t dt
x x u
x x u
x u
x u
x x u
x x u
x x u
![Page 20: Statistical learning and optimal control: A framework for biological learning and motor control](https://reader030.vdocuments.site/reader030/viewer/2022020117/56812c0e550346895d907d33/html5/thumbnails/20.jpg)
Solution of discrete LTI state equations
1
1 0 0
2 1 1 0 0 12
10 1
0
k k k
kk mk k m
m
A C
A C
A C A AC C
A A C
x x u
x x u
x x u x u u
x x u
![Page 21: Statistical learning and optimal control: A framework for biological learning and motor control](https://reader030.vdocuments.site/reader030/viewer/2022020117/56812c0e550346895d907d33/html5/thumbnails/21.jpg)
Relating discrete and continuous representation of a linear system
0
1 1
11
0
0
1
exp 0 exp exp
1
exp 1 0 exp 1 exp
exp 1 0 exp 1 exp
exp 1 exp
exp exp 0 exp exp
t
c c c
k k
kkc c c c
k
c c c c
k
c c ck
c c c c c
t A t A t A C d
t k
A k A k A C d
A k A k A C d
A k A C d
A A k A k A C d
x x u
x x
x x u
x u
u
x u
0
1
1
11
1
exp 1 exp
exp exp 1
exp exp 1
exp exp
k
k
c c ck
kkc c ck
kk k
c c c ck
k kc c c c
A k A C d
A A k C d
A A A k C
A A I A C
u
x u
x u
x u
Assume that u(t) is constant between the two sampling intervals.
![Page 22: Statistical learning and optimal control: A framework for biological learning and motor control](https://reader030.vdocuments.site/reader030/viewer/2022020117/56812c0e550346895d907d33/html5/thumbnails/22.jpg)
Discrete and continuous representation of a linear system
(noise free scenario)
1
c cc
c c
k k kd d
d k k kd d
t A t C tG
t B t D t
A CG
B D
x x u
y x u
x x u
y x u
Continuous system
Discrete system
1
sampling interval
exp
exp
d c
d c c c
d c
d c
A A
C A A I C
B B
D D
![Page 23: Statistical learning and optimal control: A framework for biological learning and motor control](https://reader030.vdocuments.site/reader030/viewer/2022020117/56812c0e550346895d907d33/html5/thumbnails/23.jpg)
11 exp exp 1
c c
T
kk kc c ck
t A t C t
E t Q t
A A k C d
x x u
u u
x x u
We note that for small , the term inside the exponential is near zero over the range k to (k+1). Therefore, we can approximate the matrix exponential with an identity matrix.
11
( )
1 1( ) ( )
1
1
expkk k
c ck
k kd
Tk kk k Tc ck k
k T Tc ck
k T Tc ck
Tc c
A C d
A
E E C d C d
E C C d
C E C d
C QC
x x u
x w
w w u u
u u
u u
State noise in continuous domain
Equivalent state noise in discrete domain
State noise in the continuous system
![Page 24: Statistical learning and optimal control: A framework for biological learning and motor control](https://reader030.vdocuments.site/reader030/viewer/2022020117/56812c0e550346895d907d33/html5/thumbnails/24.jpg)
y c
T
t B t t
E t R t
x v
v v
Suppose that we imagine that we average the sample y(t) over the discrete interval to get our discrete sample:
Measurement noise in continuous domain
Equivalent state noise in discrete domain
( 1) ( 1)( )
( 1)( )
1 1y
1
k kk
c
k k
kk
c
k
t dt B t t dt
B t dt
y x v
x v
Noise in discrete domain is:
( 1)( )
( 1)( ) ( )
2
2
1
1
1
kk
k
kk k T
k
t dt
E t dtd
R
R
v v
v v v v
Measurement noise in the continuous system
![Page 25: Statistical learning and optimal control: A framework for biological learning and motor control](https://reader030.vdocuments.site/reader030/viewer/2022020117/56812c0e550346895d907d33/html5/thumbnails/25.jpg)
Discrete and continuous representation of a linear system with noise
1
Tc c u u u
Tc y y y
k k k k k k Td d u u u
k k k k k Td y y y
t A t C t t E t Q t
t B t t E t R t
A C E Q
RB E
x x u ε ε ε
y x ε ε ε
x x u
y x
Continuous system
Equivalent discrete system
1
sampling interval
exp
exp
d c
d c c c
d c
A A
C A A I C
B B
![Page 26: Statistical learning and optimal control: A framework for biological learning and motor control](https://reader030.vdocuments.site/reader030/viewer/2022020117/56812c0e550346895d907d33/html5/thumbnails/26.jpg)
1 2
1 1
2 2
( 1) ( )
1
3 . / 0.65 . . / 0.004 . /
0 1 0 00
10
1 0
0.001sec
exp
exp
c c
k k
c
c c c
k N m rad b N m s rad m kg m rad
x x x x
x xk b
x x um m m
A C
y
A C
A A
C A A I C
x x u
x
x x u
(0)0
h h
T Th h h h h h
T Th h h h h h
h h
G
J T L
J G T G L
E F
y x
y r y r u u
x r x r u u
x x u
Continuous time model of the eye
Discrete time model of the eye
Optimal control problem
![Page 27: Statistical learning and optimal control: A framework for biological learning and motor control](https://reader030.vdocuments.site/reader030/viewer/2022020117/56812c0e550346895d907d33/html5/thumbnails/27.jpg)
0 0.02 0.04 0.06 0.08
-2
0
2
4
0 0.02 0.04 0.06 0.080
2
4
6
0 0.02 0.04 0.06 0.080
0.025
0.05
0.075
0.1
0.125
0.15
0.175
0 0.02 0.04 0.06 0.080
500000
1106
1.5106
2106
0 0.02 0.04 0.06 0.080
0.025
0.05
0.075
0.1
0.125
0.15
0.175
(1)(1)
(2)
( )( )
0 0 0 1 0 0 0
0 1 0 00 0 0
0 0 00 0 00 0 10 0
hp
p
T
TT L
T
r
r
r
Make a 30 deg (~0.5 rad) saccade in 0.5 seconds
Time (sec)
r T
u x x
![Page 28: Statistical learning and optimal control: A framework for biological learning and motor control](https://reader030.vdocuments.site/reader030/viewer/2022020117/56812c0e550346895d907d33/html5/thumbnails/28.jpg)
0 0.02 0.04 0.06 0.080
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 0.02 0.04 0.06 0.08
0
2
4
6
8
10
0 0.02 0.04 0.06 0.08
-2
0
2
4
6
8
Po
siti
on
(ra
d)
Vel
oci
ty (
rad
/s)
Mo
tor
com
man
d (
N.m
)
5 deg
10 deg
15 deg
20 deg
Eye muscle activity for a 10 deg saccade.
Time (sec)
![Page 29: Statistical learning and optimal control: A framework for biological learning and motor control](https://reader030.vdocuments.site/reader030/viewer/2022020117/56812c0e550346895d907d33/html5/thumbnails/29.jpg)
1 1 2
1
2
1 2
1 . / 0.15 . . / 0.1 . / 0.15 . /
: position of the right hand
: position of the left hand
: cursor position (the observable variable)
k N m rad b N m s rad m kg m rad m kg m rad
x
x
y
y x x
Resolving redundancies
Suppose that we have a cursor that its position depends on the sum of positions of left and right joysticks. Suppose that the left joystick is heavier than the right joystick. We want to move the cursor to some location. How much should we move each joystick?
0 0.1 0.2 0.3 0.4 0.50
0.2
0.4
0.6
0.8
1
0 0.1 0.2 0.3 0.4 0.5
-1
0
1
2y
1x
2x
1u
2ur
![Page 30: Statistical learning and optimal control: A framework for biological learning and motor control](https://reader030.vdocuments.site/reader030/viewer/2022020117/56812c0e550346895d907d33/html5/thumbnails/30.jpg)
( 1) ( )
( 1) ( 1)
(0)
(0)
1 (0)
k k
k k
h h
h h
T T Ta h h h h h h h h
T T T T T Th h
A C
B
G
E F
J T L E F
L F G TGF F G T GE
x x u
y x
y x
x x u
y r y r u u λ x x u
u r x
Summary: Optimal control of a linear system with quadratic cost
![Page 31: Statistical learning and optimal control: A framework for biological learning and motor control](https://reader030.vdocuments.site/reader030/viewer/2022020117/56812c0e550346895d907d33/html5/thumbnails/31.jpg)
Issues with the control policy:
• What if the system gets perturbed during the control policy? With the current approach, there is no compensation for the perturbation.
• In reality, both the state update equation and the measurement equation are subject to noise. How do we take that into account?
• To resolve this, we need a way to figure out what command to produce, given that we find ourselves at some state x at some time k. Once we figure this out, we will consider the situation where we cannot measure x directly, but have noise to deal with. Our best estimate will be through the Kalman filter. This will link estimation with control.
1( ) ( ) ( ) ( 1) ( 1) ( 1)
0
pk T k k k T k k
k
J L T
u u y y
(0)
(0) (1) ( 1)
(1) (2) ( ) (1) (2) ( )
, , ,
, , , , , ,
p
p pB B B
x
u u u
y y y x x x
Starting at state
Sequence of actions
Observations
Cost to minimize
![Page 32: Statistical learning and optimal control: A framework for biological learning and motor control](https://reader030.vdocuments.site/reader030/viewer/2022020117/56812c0e550346895d907d33/html5/thumbnails/32.jpg)
( ) ( ) ( ) ( ) ( ) ( ) ( )
( ) ( )
( ) ( 1) ( 1) ( ) ( 1) ( 1)
( 1) ( ) ( 1) ( 1) ( ) ( 1)
( 1) ( ) ( 1)
( 1) ( 1) ( 1) ( 1) ( 1) ( 1) ( 1) (
2
p p T p p p T T p p
p T p
Tp p p p p p
p T T p p p T T p p
p T T p p
p p T p p p T p p
J T B T B
W B T B
J A C W A C
A W A C W C
C W A
J T L J
y y x x
x u x u
x x u u
u x
y y u u
)
( 1) ( 1) ( 1) ( 1) ( 1) ( 1) ( )
( 1)( 1) ( 1) ( ) ( 1) ( ) ( 1)
( 1)
1( 1) ( 1) ( ) ( ) ( 1)
1( 1) ( 1) ( ) ( )
( 1) ( 1) ( 1)
2 2 2 0
p
p T T p p p T p p p
pp p T p p T p p
p
p p T p T p p
p p T p T p
p p p
B T B L J
dJL C W C C W A
d
L C W C C W A
G L C W C C W A
G
x x u u
u u xu
u x
u x
Cost at the last time point
Cost-to-go at the next to the last time point
Note that at the last time step, cost is a quadratic function of state
![Page 33: Statistical learning and optimal control: A framework for biological learning and motor control](https://reader030.vdocuments.site/reader030/viewer/2022020117/56812c0e550346895d907d33/html5/thumbnails/33.jpg)
( ) ( 1) ( ) ( 1) ( 1) ( ) ( 1) ( 1) ( ) ( 1)
( 1) ( 1) ( 1) ( 1) ( 1) ( 1) ( 1) ( )
( 1) ( 1) ( ) ( 1)
( 1) ( 1) ( ) ( 1) ( 1) ( ) (
2
2
p p T T p p p T T p p p T p p
p p T T p p p T p p p
p T T p T p p
p T p T p p p T T p p
J A W A C W C W BA
J B T B L J
B T B A W A
L C W C C W A
x x u u u x
x x u u
x x
u u u x
1)
( 1) ( 1) ( ) ( 1)
( 1) ( 1) ( 1) ( ) ( 1) ( 1) ( 1) ( ) ( 1)2
p T T p T p p
p T p T p T p p p p T T p p
B T B A W A
G L C W C G C W A
x x
x x u x
We will now show that if we choose the optimal u at step p-1, then cost to go is once again a quadratic function of state x.
Can be simplified to:( ) ( 1)T p pA W CG
( 1) ( 1) ( 1) ( ) ( ) ( 1) ( 1)
( 1) ( 1) ( ) ( 1) ( 1)
( 1) ( 1) ( 1)
p p T T p T p T p p p
p T T p T p p p
p T p p
J B T B A W A A W CG
B T B A W A CG
W
x x
x x
x x
Can be simplified to: ( 1) ( ) ( 1)2 p T T p pA W CG x
![Page 34: Statistical learning and optimal control: A framework for biological learning and motor control](https://reader030.vdocuments.site/reader030/viewer/2022020117/56812c0e550346895d907d33/html5/thumbnails/34.jpg)
We just showed that for the last time step, the cost to go is a quadratic function of x:
( ) ( ) ( ) ( )p p T p pJ Wx x
The optimal u to at time point p-1 minimizes cost to go J(p-1):
1( 1) ( 1) ( ) ( )
( 1) ( 1) ( 1)
p p T p T p
p p p
G L C W C C W A
G
u x
If at time point p-1 we indeed carry out this optimal policy u, then the cost to go at time p-1 also becomes a linear function of x:
( 1) ( 1) ( 1) ( 1)
( 1) ( 1) ( ) ( 1)
p p T p p
p T p T p p
J W
W B T B A W A CG
x x
If we now repeat the process and find the optimal u for time point p-2, it will be:
1( 2) ( 2) ( 1) ( 1)
( 2) ( 2) ( 2)
p p T p T p
p p p
G L C W C C W A
G
u x
And if we apply the optimal u at time points p-2 and p-1, then the cost to go at time point p-2 will be a quadratic function of x:
( 2) ( 2) ( 2) ( 2)
( 2) ( 2) ( 1) ( 2)
p p T p p
p T p T p p
J W
W B T B A W A CG
x x
So in general, if for time points t+1, …, p we calculated the optimal policy for u, then the above gives us a recipe to compute the optima policy for time point t.
![Page 35: Statistical learning and optimal control: A framework for biological learning and motor control](https://reader030.vdocuments.site/reader030/viewer/2022020117/56812c0e550346895d907d33/html5/thumbnails/35.jpg)
( 1) ( ) ( )
( 1) ( 1)
1(0) ( 1) ( 1) ( 1) ( ) ( ) ( )
0
k k k
k k
pk T k k k T k k
k
A C
B
J T L
x x u
y x
y y u u
Summary of the linear quadratic tracking problem
( ) ( ) ( ) ( )
( ) ( )
1( 1) ( 1) ( ) ( )
( 1) ( 1) ( 1)
( 1) ( 1) ( 1) ( 1)
( 1) ( 1) ( ) ( 1)
p p T p p
p T p
p p T p T p
p p p
p p T p p
p T p T p p
J W
W B T B
G L C W C C W A
G
J W
W B T B A W A CG
x x
u x
x x
(0)x
(1)y
(0)u
(1)x ( )px
( 1)pu
( )py
Cost to go
1(0) (0) (1) (1)
(0) (0) (0)
(0) (0) (0) (0)
(0) (0) (1) (0)
T T
T
T T
G L C W C C W A
G
J W
W B T B A W A CG
u x
x x
The procedure is to compute the matrices W and G from the last time point to the first time point.
![Page 36: Statistical learning and optimal control: A framework for biological learning and motor control](https://reader030.vdocuments.site/reader030/viewer/2022020117/56812c0e550346895d907d33/html5/thumbnails/36.jpg)
1 2
1 1
2 2
( 1) ( )
1
3 . / 0.45 . . / 0.3 . /
0 1 0 00
10
1 0
0.01sec
exp
exp
c c
k k
c
c c c
k N m rad b N m s rad m kg m rad
x x x x
x xk b
x x um m m
A C
y
A C
A A
C A A I C
x x u
x
x x u
Continuous time model of the elbow
Discrete time model of the elbow
Modeling of an elbow movement
![Page 37: Statistical learning and optimal control: A framework for biological learning and motor control](https://reader030.vdocuments.site/reader030/viewer/2022020117/56812c0e550346895d907d33/html5/thumbnails/37.jpg)
Goal: Reach a target at 30 deg in 300 ms time and hold it there for 100 ms.
Unperturbed movement Arm held at start for 200ms Force pulse to the arm for 50ms
0 0.1 0.2 0.3 0.4sec
0
0.1
0.2
0.3
0.4
0.5
noitisoP
0.05 0.1 0.15 0.2 0.25 0.3 0.35sec
-75
-50
-25
0
25
50
75
rotoMdnammoc
0 0.1 0.2 0.3 0.4sec
0
0.1
0.2
0.3
0.4
0.5
noitisoP
0.05 0.1 0.15 0.2 0.25 0.3 0.35sec
-5
0
5
10
15
rotoMdnammoc
0 0.1 0.2 0.3 0.4sec
0
0.1
0.2
0.3
0.4
0.5
noitisoP0.05 0.1 0.15 0.2 0.25 0.3 0.35
sec
-30
-20
-10
0
10
rotoMdnammoc
0 0.1 0.2 0.3 0.4sec
1
1.25
1.5
1.75
2
2.25
2.5
L
L
0 0.1 0.2 0.3 0.4sec
0
500000
1 106
1.5 106
2 106
soPtsoc T
0 0.1 0.2 0.3 0.4sec
0
5000
10000
15000
20000
leVtsoc T
![Page 38: Statistical learning and optimal control: A framework for biological learning and motor control](https://reader030.vdocuments.site/reader030/viewer/2022020117/56812c0e550346895d907d33/html5/thumbnails/38.jpg)
0 0.1 0.2 0.3 0.4 0.5 0.6sec
0
500000
1106
1.5106
2106
soPtsoc
Movement with a via point: we set the cost to be high at the time when we are supposed to be at the via points.
0 0.1 0.2 0.3 0.4 0.5 0.6sec
-10
0
10
20
30
rotoMdnammoc
0 0.1 0.2 0.3 0.4 0.5 0.6sec
0
0.2
0.4
0.6
0.8
noitisoP
0 0.1 0.2 0.3 0.4 0.5 0.6sec
0
200
400
600
800
soPniaG
T
G
![Page 39: Statistical learning and optimal control: A framework for biological learning and motor control](https://reader030.vdocuments.site/reader030/viewer/2022020117/56812c0e550346895d907d33/html5/thumbnails/39.jpg)
( 1) ( ) ( )
( 1) ( 1)
1( ) ( ) ( ) ( 1) ( 1) ( 1)
0
0,
0,
k k kx x
k ky y
pk T k k k T k k
k
A C N Q
B N R
J L T
x x u ε ε
y x ε ε
u u y y
Stochastic optimal control
Biological processes have noise. For example, neurons fire stochastically in response to a constant input, and muscles produce a stochastic force in response to constant stimulation. Here we will see how to solve the optimal control problem with additive Gaussian noise.
Cost to minimize
Because there is noise, we are no longer able to observe x directly. Rather, the best we can do is to estimate it. As we saw before, for a linear system with additive noise the best estimate of state is through the Kalman filter. So our goal is to determine the best command u for the current estimate of x so that we can minimize the global cost function.
Approach: as before, at the last time point p the cost is a quadratic function of x. We will find the optimal motor command for time point p-1 so that it minimizes the expected cost to go. If we perform the optimal motor command at p-1, then we will see that the cost to go at p-1 is again a quadratic function of x.
![Page 40: Statistical learning and optimal control: A framework for biological learning and motor control](https://reader030.vdocuments.site/reader030/viewer/2022020117/56812c0e550346895d907d33/html5/thumbnails/40.jpg)
Preliminaries: Expected value of a squared random variable. In the following example, we assume that x is the random variable.
2
22
2
2
var
var
var
var
var
T
TT
T
T
T
T
v x
x E x E x
E v E x
x E x
v
E E E
E v E tr
tr E
tr tr E E
tr E E
x x
x xx x x
xx
xx
x x x
x x x
1 2
2 2 21 2
21 1 2 1
22 1 2 2
21 2
2
1
Tn
Tn
n
T n
n n n
nT
ii
T T
r r r
r r r
r r r r r
r r r r r
r r r r r
tr r
tr
r
r r
rr
rr
r r rr
Scalar x
Vector x
var
T
T
v A
E v tr A E AE
x x
x x x
![Page 41: Statistical learning and optimal control: A framework for biological learning and motor control](https://reader030.vdocuments.site/reader030/viewer/2022020117/56812c0e550346895d907d33/html5/thumbnails/41.jpg)
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
( ) ( )
( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
var var
p p T p p
p T T p p T py y
p T p
p p T p p T py y
p p p T p p py
p p T p p
J T
B T B T
W B T B
E J E W E T
tr W E W E tr T
tr W Q E W E
y y
x x ε ε
x x ε ε
x x x ε
x x
( )
( 1) ( 1) ( ) ( 1) ( 1) ( ) ( )
( 1) ( ) ( 1) ( 1) ( ) ( 1)
( 1) ( ) ( 1) ( ) ( )2
p
Tp p p p p p p
p T T p p p T T p p
p T T p p T p p
tr T Q
A C W A C tr W Q tr T Q
A W A C W C
A W C tr W Q tr T Q
x u x u
x x u u
x u
Cost at the last time point
![Page 42: Statistical learning and optimal control: A framework for biological learning and motor control](https://reader030.vdocuments.site/reader030/viewer/2022020117/56812c0e550346895d907d33/html5/thumbnails/42.jpg)
( 1) ( 1) ( 1) ( 1) ( 1) ( 1) ( 1) ( )
( 1) ( 1) ( ) ( 1) ( 1) ( 1) ( ) ( 1)
( 1) ( ) ( 1) ( ) ( ) ( 1)
( 1)( 1)
( 1)
2
2
p p T p p p T p p p
p T T p T p p p T p T p p
p T T p p p p T py y
pp T
p
J T L E J
B T B A W A L C W C
A W C tr W Q tr T Q T
dJL C W
d
y y u u
x x u u
x u ε ε
u
( ) ( 1) ( ) ( 1)
1( 1) ( 1) ( ) ( ) ( 1)
1( 1) ( 1) ( ) ( )
( 1) ( 1) ( 1)
2 0p p T p p
p p T p T p p
p p T p T p
p p p
C C W A
L C W C C W A
G L C W C C W A
G
u x
u x
u x
Cost-to-go at the next to the last time point
So we see that if our system has additive state or measurement noises, the optimal motor command remains the same as if the system had no noises at all. When we use the optimal policy at time point p-1, we see that, as before, the cost-to-go at p-1 is a quadratic function of x. The matrix W at p-1 remains the same as when the system had no noise.
The problem is that we do not have x. The best that we can do is to estimate x via the Kalman filter. We do this in the next slide.
![Page 43: Statistical learning and optimal control: A framework for biological learning and motor control](https://reader030.vdocuments.site/reader030/viewer/2022020117/56812c0e550346895d907d33/html5/thumbnails/43.jpg)
1 21 1
1 2 2 2 2
2 2 2 3 2 32 2
1 2 2 3 2 32 2 2
1 21
1 2 2 2 2 2
1 1 1
ˆ
ˆ ˆ
ˆ ˆ ˆ
ˆ ˆ ˆ
ˆ ˆ
ˆ ˆ ˆ
ˆ
p pp p
p p p p p
p p p p p pp p
p p p p p pp p p
p pp
p p p p p p
p p p
G
A C
K B
A C AK B
A C AK B
G
u x
x x u
x x y x
x x u y x
x x
x x u y x
u x
On trial p-1, our best estimate of x is the prior.
We compute the prior for the current trial from the posterior of the last trial.
The posterior estimate.
Our short-hand way to note the prior estimate of x on trial p-1.
Although the noises in the system do not affect the gain G, the estimate of x is of course affected by the noises because the Kalman gain is influenced by them.
Kalman gain
![Page 44: Statistical learning and optimal control: A framework for biological learning and motor control](https://reader030.vdocuments.site/reader030/viewer/2022020117/56812c0e550346895d907d33/html5/thumbnails/44.jpg)
( 1) ( ) ( )
( 1) ( 1)
1(0) ( ) ( ) ( ) ( 1) ( 1) ( 1)
0
0,
0,
k k kx x
k ky y
pk T k k k T k k
k
A C N Q
B N R
J L T
x x u ε ε
y x ε ε
u u y y
Summary of stochastic optimal control for a linear system with additive Gaussian noise and quadratic cost
( ) ( ) ( ) ( ) ( )
( ) ( )
( ) ( ) ( )
1( 1) ( 1) ( ) ( )
( 1) ( 1) ( 1)
( 1) ( 1) ( 1) ( 1) ( 1)
( 1) ( 1) ( ) ( 1)
( 1) ( )
ˆ
p p T p p p
p T p
p T p py y
p p T p T p
p p p
p p T p p p
p T p T p p
p p Ty
J W w
W B T B
w T tr T Q
G L C W C C W A
G
J W w
W B T B A W A CG
w tr W Q T
x x
ε ε
u x
x x
ε ( 1) ( )p py w ε
Cost to go at the start
Cost to go at the end
![Page 45: Statistical learning and optimal control: A framework for biological learning and motor control](https://reader030.vdocuments.site/reader030/viewer/2022020117/56812c0e550346895d907d33/html5/thumbnails/45.jpg)
( ) ( 1) ( )
( ) ( ) ( )
1 1( ) ( )
0,
0,
ˆ ˆ ˆ
ˆvar
ˆ ˆ
ˆ ˆ
ˆ ˆ
n n nx x
n n ny y
n n n n n nn n
n n n n
n n T
T
T
A N Q
H N R
y H
P
tr P tr E
E tr
E
x x ε ε
y x ε ε
x x k x
x
x x x x
x x x x
x x x x
11 1( )
1( )
n n n nn T T
n n n nn
P H HP H R
P I H P
k
k
The duality of the Kalman filter and optimal control
In the estimation problem, we have a model of how we think the hidden states x are related to observations y. Given an observation y, we have a rule with which we can change our estimates.
Our objective is to minimize the trace of the variance of our estimate xhat. This variance is P. This trace is our scalar cost function, which is quadratic in terms of xhat. We minimize it by finding the optimal gain k.
If we use this optimal k, then we can compute the variance in the next time step. Our cost (i.e., variance) of course still remains quadratic in terms of xhat.
1 1( )
11 1 1 1
n n n n n nT n T
n n n n n n n nT T T
P AP A Q A I H P A Q
P A I P H HP H R H P A Q
k
![Page 46: Statistical learning and optimal control: A framework for biological learning and motor control](https://reader030.vdocuments.site/reader030/viewer/2022020117/56812c0e550346895d907d33/html5/thumbnails/46.jpg)
( 1) ( ) ( )
( 1) ( 1)
1(0) ( ) ( ) ( ) ( 1) ( 1) ( 1)
0
( ) ( ) ( ) ( ) ( )
( 1) ( 1) ( 1) ( 1) ( 1) ( 1) ( 1)
0,
0,
k k kx x
k ky y
pk T k k k T k k
k
p p T p p p
p p T p p p T p p
A C N Q
B N R
J L T
J W w
J T L E J
x x u ε ε
y x ε ε
u u y y
x x
y y u u
( )
1 1 1
( 1) ( 1) ( 1) ( 1) ( 1)
ˆ
p
p p p
p p T p p p
G
J W w
u x
x x
The duality of the Kalman filter and optimal control, continued.
In the control problem, we have a model of how we think the hidden states x are related to commands u and observations y.
Our objective is to find the u that minimizes a scalar cost. To find this u, we run time backwards!
We start at the end time point and find the optimal u that minimizes the cost to go. When we find this u, we then move to the next time point and so on.
The cost to go is a quadratic function of hidden states. This is very similar to the Kalman filter, where the cost was a quadratic function of the hidden states as well.
![Page 47: Statistical learning and optimal control: A framework for biological learning and motor control](https://reader030.vdocuments.site/reader030/viewer/2022020117/56812c0e550346895d907d33/html5/thumbnails/47.jpg)
1( 1) ( ) ( 1) ( ) ( ) ( 1)k T k k T k T k T kW A I W C L C W C C W A B T B
State noiseMeasurement noiseState uncertainty
11 1( ) n n n nn T TP H R HP H
k
1( ) ( ) ( 1) ( 1) ( )ˆk k T k T k kL C W C C W A u x
11 1 1 1n n n n n n n nT T TP A I P H R HP H H P A Q
So W is like an estimate of state uncertainty matrix, BTB is like state update noise Q, and L is like measurement noise R.
In optimal control, the motor commands are generated by applying a gain to the state. This gain is like the Kalman gain.
Duality of optimal control and Kalman filter, continued.
Motor cost
Kalman Filter
Optimal control
Weighting of state Tracking cost
![Page 48: Statistical learning and optimal control: A framework for biological learning and motor control](https://reader030.vdocuments.site/reader030/viewer/2022020117/56812c0e550346895d907d33/html5/thumbnails/48.jpg)
A B
Noise characteristics of biological systems are not additive GaussianNoise in the motor output grows with the size of the motor command
The standard deviation of noise grows with mean force in an isometric task. Participants produced a given force with their thumb flexors. In one condition (labeled “voluntary”), the participants generated the force, whereas in another condition (labeled “NMES”) the experimenters stimulated their muscles artificially to produce force. To guide force production, the participants viewed a cursor that displayed thumb force, but the experimenters analyzed the data during a 4-s period in which this feedback had disappeared. A. Force produced by a typical participant. The period without visual feedback is marked by the horizontal bar in the 1st and 3rd columns (top right) and is expanded in the 2nd and 4th columns. B. When participants generated force, noise (measured as the standard deviation) increased linearly with force magnitude. Abbreviations: NMES, neuromuscular electrical stimulation; MVC, maximum voluntary contraction. From Jones et al. (2002) J Neurophysiol 88:1533.
Electrical stimulation of the muscle
Voluntary contraction of the muscle
![Page 49: Statistical learning and optimal control: A framework for biological learning and motor control](https://reader030.vdocuments.site/reader030/viewer/2022020117/56812c0e550346895d907d33/html5/thumbnails/49.jpg)
Representing signal dependent noise
,N I 0 Vector of zero mean, variance 1 random variables
( 1) ( ) ( ) ( ) noise with standard deviation that linearly grows with
( 1) ( 1) ( ) noise with standard deviation that linearly grows with
k k k kx
k k ky
A B
H
x x u ε u
y x ε x
Zero mean Gaussian noise signal dependent motor noise
Zero mean Gaussian noise signal dependent sensory noise
( ) ( ) ( )1 11 1 1
( ) ( ) ( )( ) ( )2 22 2 2
1
1 2 2
( )( )
( )
0 0 0 0
motor noise 0 0 0 0
0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0
motor noise =
var motor noise va
k k k
k k kk k
kki i
i
ki
c u c u
c u c u
c
C C c
C
C
u
u
( )r k T T T Ti i ii
i i
C C C u uu
![Page 50: Statistical learning and optimal control: A framework for biological learning and motor control](https://reader030.vdocuments.site/reader030/viewer/2022020117/56812c0e550346895d907d33/html5/thumbnails/50.jpg)
( )( 1) ( ) ( ) ( ) ( )
( )( ) ( ) ( ) ( )
( 1) ( ) ( ) ( ) ( ) ( ) ( )
ˆ
( ) ( ) ( ) ( ) ( )
ˆ ˆ ˆ
0,1 0,1
, , ,
kk k k k kx i ii
kk k k ky i ii
k k k k k k k
i i
x x y y x
k T k k T k k
A B C
H D
A AK H B
N N
N Q N Q N Q
L T
x x u ε u
y x ε x
x x y x u
ε 0 ε 0 0
u u x x
Cost per step:
Control problem with signal dependent noise (Todorov 2005)
To find the motor commands that minimize the total cost, we start at the last time step p and work backwards. At time step p, the cost is a quadratic function of x. At time step p-1, we can find the optimal u that minimizes the cost to go. When we find this optimal u, the cost to go at p-1 will be a quadratic function of x plus a quadratic function of x-xhat. In general, by induction we can prove that as long as we apply the optimal u, the cost to go will have this quadratic form. This proof is due to E. Todorov, Neural Computation, 2005.
![Page 51: Statistical learning and optimal control: A framework for biological learning and motor control](https://reader030.vdocuments.site/reader030/viewer/2022020117/56812c0e550346895d907d33/html5/thumbnails/51.jpg)
( ) ( ) ( ) ( )
( 1) ( 1) ( 1) ( 1) ( 1) ( 1) ( ) ( 1) ( 1) ( 1)
( ) ( ) ( ) ( ) ( ) ( ) ( 1) ( 1) ( 1)
( ) ( 1
ˆ, ,
ˆvar , ,
var
p p T p p
p p T p p T p p p p p p
Tp p p p p p p p p
p px i
J T
J L T E J
E J E T E tr T
Q C
x x
u u x x x x u
x x x x x u
x u
) ( 1)
( ) ( 1) ( 1) ( ) ( 1) ( 1)
( ) ( 1) ( ) ( 1)
( 1) (
(
1) ( 1) ( ) ( 1) ( )
( 1) ( )
) ( 1)
( ( 1 () ) 2
p T px
p T Tii
Tp p p p p p
p p T T p px i ii
p p T p T p p px
i ii
px
p T T p p p
C
E J A B T A B
tr T Q C T C
J T A
C C T
T A tr T Q
T B
C
CL B
u
x u x u
u u
x x
u u u
1) ( ) ( 1)
( 1)( ) ( 1) ( ) ( 1)
( 1)
1( 1) ( ) ( ) (
(
( ) 1
)
)
2 2 0
ˆ
T T p p
pT p p T p p
p
p T p Tp p p
px
x
B T A
dJL B T B B T A
d
L B T B B
C
T AC
x
u xu
u x
Cost at time step p (last time step)
Optimal u to minimize the cost-to-go at time step p-1
Cost-to-go at p-1
![Page 52: Statistical learning and optimal control: A framework for biological learning and motor control](https://reader030.vdocuments.site/reader030/viewer/2022020117/56812c0e550346895d907d33/html5/thumbnails/52.jpg)
1( ) ( 1) ( 1) ( 1)
( ) ( ) ( )
( 1) ( 1) ( 1) ( ) ( 1) ( )
( 1) ( 1) ( ) ( 1)
( 1) ( 1) ( ) ( 1)
( 1) ( 1)
ˆ
ˆ ˆ
ˆ2
ˆ ˆ ˆ ˆ ˆ2
p T p p T p
p p p
p p T p T p p px
p T p T T p p
p T p T T p p
TT T T
p p T
G L B T B C B T A
G
J T A T A tr T Q
G B T A
G B T A
Z Z Z Z
J T
u x
x x
x x
x x
x x x x x x x x x x
x
( 1)
( 1)( 1)( 1)
( 1) ( ) ( ) ( 1) ( 1)
( 1) ( 1) ( ) ( 1) ( 1) ( 1) ( )
( 1) ( 1) ( 1) ( 1) ( 1) ( 1) ( 1) (
ˆ ˆ
px
pe pp
p T p p p p
W
Tp p T p p p p px
Ww
p p T p p p T p px e
A T A T BG
A T BG tr T Q
J W W w
e
x
x x x x
x x e e
1)p
J(p-1) is the cost-to-go at time step p-1, assuming that the optimal u is produced at p-1.Note that unlike the cost at time step p, this cost-to-go is quadratic in terms of x and the error in estimation of x. So now we need to show that if we continue to produce the optimal u at each time step, the cost-to-go remains in this form for all time steps.
![Page 53: Statistical learning and optimal control: A framework for biological learning and motor control](https://reader030.vdocuments.site/reader030/viewer/2022020117/56812c0e550346895d907d33/html5/thumbnails/53.jpg)
( 1) ( 1) ( 1) ( 1) ( 1) ( 1) ( 1) ( 1)k k T k k k T k k kx eJ W W w x x e e
( ) ( ) ( ) ( ) ( ) ( ) ( 1) ( ) ( ) ( )ˆ, ,k k T k k T k k k k k kJ L T E J u u x x x x u
Conjecture: If at some time point k+1 the cost-to-go under an optimal control policy is quadratic in x and e, and provided that we produce a u that minimizes the cost-to-go at time step k, then the cost-to-go at time step k will also be quadratic.To prove this, our first step is to find the u that minimizes the cost-to-go at time step k, and then show the at the resulting optimal cost-to-go remains in the quadratic form above.
To compute the expected value term, we need to do some work on the term e.
( )
( 1) ( 1) ( 1)
( )( ) ( ) ( )
( ) ( ) ( ) ( )
( ) ( )( ) ( ) ( ) ( ) ( ) ( ) (
( ) ( ) ( )
) ( ) ( )
( )
( )
ˆ
ˆ ˆkk k ky i ii
k k k
kk k kx i ii
k k k k
k kk k k k k k k k kx i y ii ii i
k
k
A C
A AK H
A AK H C AK A D
B
K
H D
B
e x x
x ε u
x x
e ε ε x
u
u
x uε x
( )ky
( 1) ( ) ( ) ( ) ( ) ( )
( 1) ( ) ( ) ( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( )ˆ
ˆ, ,
ˆvar , ,
k k k k k k
k k k k k k T T k k T Tx i i yi
k k k T T k T Ti i xi
E A AK H
Q C C AK Q K A
AK D D K A Q
e x x u e
e x x u u u
x x
![Page 54: Statistical learning and optimal control: A framework for biological learning and motor control](https://reader030.vdocuments.site/reader030/viewer/2022020117/56812c0e550346895d907d33/html5/thumbnails/54.jpg)
(
( 1) ( 1) ( 1) ( ) ( ) ( ) ( ) ( ) ( 1) ( ) ( )
( 1) ( ) ( )
( 1) ( 1) ( 1) ( ) ( ) ( ) ( ) ( ) ( 1) ( ) ( )
( 1)
1)
ˆ, ,
ˆ, ,
Tk T k k k k k k k k
T ki x
k kx x
k k T kx x
Tk T k k k k k k T k ke
i
k
i
ke
ke
E W A B W A B
tr W Q
E W A AK H W A AK H
tr W
C W C
Q
x x x x u x u x u
u u
e e x x u e e
( ) ( )ˆ
( ) ( )
( )
( 1)
( ) ( 1) ( ( ))
T ki e i
T k T T k ki
k k T Tx x y
k T k
e i
i
Ti
k kD K A W A
Q AK Q K
D
C W C
K
A
u u
x x
To compute the Expected value of J(k+1), we compute the Exp value of the two quadratic terms (the Exp value of the third term is zero as it is composed only of zero mean random variables).
( )kxC
( )keC
( )kD
( ) ( ) ( ) ( ) ( ) ( ) ( 1) ( ) ( ) ( )
( ) ( ) ( ) ( 1) ( ) ( ) ( 1) ( )
( )( ) ( ) ( 1) ( ) ( 1) ( )
( )
( ) ( )) ((
ˆ, ,
2
2 2 0
k k T k k T k k k k k k
k T k k T k k k T T k kx e x x
kk k T k
k k T kx e x
k T k kx e x xk
k L C C
J L T E J
L C C B W B B W A
dJL C C B W B
B W
B W Ad
u u x x x x u
u u u x
u xu
u
) 1 ( )11 ( ) ˆT k kxB B W A
x
Terms that do not depend on u
( )kG
![Page 55: Statistical learning and optimal control: A framework for biological learning and motor control](https://reader030.vdocuments.site/reader030/viewer/2022020117/56812c0e550346895d907d33/html5/thumbnails/55.jpg)
( ) ( ) ( ) ( 1) ( ) ( ) ( ) ( 1) ( )
( ) ( ) ( 1) ( ) ( )
( 1) ( 1) ( ) ( )ˆ
( ) ( ) ( 1) ( ) (
( ) ( 1 1
)
) (( )
ˆ ˆ ˆ2k k T k T T k k k T k T T k kx x
k T k T k k k Tx
k k k k T Tx x e x x y
Tk T k k k ke
k T k T kx
k Tx
J G B W A G B W A
T A W A D
tr W Q W Q Q AK Q K A
A AK H W A AK H
T A W A A W
x x x x
x x
e e
x
( 1) ( 1) ( ) (
( 1)
)ˆ
( )
( ) ( )
( ) ( ) ( ) ( (
)
( ) ( ) ( 1) ( )
(
( ) ( )
( ) ) ))
k k k
TT k
k T Tx x e x x y
k
k T k
k T k k T k
k
k k k kx e
ke
k
k
kx
A W BG A AK H W
tr W Q W Q Q AK Q K A
A AK H
W w
BG D
W
x
e e
x x e e
So we just showed that if at some time point k+1 the cost-to-go under an optimal control policy is quadratic in x and e, and provided that we produce a u that minimizes the cost-to-go at time step k, then the cost-to-go at time step k will also be quadratic. Since we had earlier shown that at time step p-1 the cost is quadratic in x and e, we now have the solution to our problem.
![Page 56: Statistical learning and optimal control: A framework for biological learning and motor control](https://reader030.vdocuments.site/reader030/viewer/2022020117/56812c0e550346895d907d33/html5/thumbnails/56.jpg)
( )( 1) ( ) ( ) ( ) ( )
( )( ) ( ) ( ) ( )
( 1) ( ) ( ) ( ) ( ) ( ) ( )
ˆ
( ) ( ) ( ) ( ) ( )
ˆ ˆ ˆ
0,1 0,1
, , ,
kk k k k kx i ii
kk k k ky i ii
k k k k k k k
i i
x x y y x
k T k k T k k
A B C
H D
A AK H B
N N
N Q N Q N Q
L T
x x u ε u
y x ε x
x x y x u
ε 0 ε 0 0
u u x x
( ) ( ) ( )
1( ) ( 1) ( 1) ( 1) ( 1)
( ) ( ) ( 1) ( 1) ( ) ( ) ( 1) ( )
( ) ( 1) ( ) ( ) ( 1) ( )
( ) ( 1)
ˆk k k
k T k T k T k T ki x i i e i x xi i
k k T k T k k T k T T k kx x x i e ii
Tk T k k k k ke x e
k kx x
G
G L C W C C W C B W B B W A
W T A W A A W BG D K A W AK D
W A W BG A AK H W A AK H
w tr W Q W
u x
( 1) ( ) ( )ˆ
( ) ( ) ( ) ( )0 0
k k k T Te x x y
p p p px e
Q Q AK Q K A
W T W w
Cost per step
Summary: Control problem with signal dependent noise (Todorov 2005)
For the last time step
![Page 57: Statistical learning and optimal control: A framework for biological learning and motor control](https://reader030.vdocuments.site/reader030/viewer/2022020117/56812c0e550346895d907d33/html5/thumbnails/57.jpg)
Computing a cost for the motor commands
(1) (0) (0)
(2) (1) (1) 2 (0) (0) (1)
(3) (2) (2) 3 (0) 2 (0) (1) (2)
1( ) (0) 1 ( )
0
1( ) 1 ( ) 1
0
var var
kk k k j j
j
k Tk k j j k j
j
A C
A C A AC C
A C A A C AC C
A A C
A C A C
x x u
x x u x u u
x x u x u u u
x x u
x u
( ) ( ) ( )
( ) 1 1
var j j T j
Tn k n k n
I
L A C A C
u u u
Because there is noise in the motor commands, it will produce variance in our state. The above equation shows that the variance at the end of the movement (at time k) is mostly influenced by the motor commands late in the movement, and less by commands that were produced early in the movement.
To see this, note that A is a matrix that when raised to a power, will become “smaller”. The larger the raised power, the smaller the resulting matrix will become. In the sum, we have a contribution from each motor command j. When j is zero (the very first command), A is raised to a very high power. The noise in this command will have little influence on the endpoint variance. When j is larger (commands near end of the movement), A is raised to a small power. The noise in these commands will have a great deal of influence on the endpoint variance. Therefore, we have a natural cost function for the motor commands: