![Page 1: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/1.jpg)
Optimal Control Theory
Benjamin Dubois-Taine
July 22, 2020
The University of British Columbia
![Page 2: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/2.jpg)
Introduction
Today we will cover
• Discrete time control and the Bellman equations
• Continuous time control and the Hamilton-Jacobi-Bellman equations
• An important special case: Linear-Quadratic Gaussian and
Linear-Quadratic Regulator problems
• Pontryagin’s maximum principle
• (time allowing) Optimal Estimation and Kalman filter
This content is taken from [1, Chapter 12].
1
![Page 3: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/3.jpg)
Introduction
Today we will cover
• Discrete time control and the Bellman equations
• Continuous time control and the Hamilton-Jacobi-Bellman equations
• An important special case: Linear-Quadratic Gaussian and
Linear-Quadratic Regulator problems
• Pontryagin’s maximum principle
• (time allowing) Optimal Estimation and Kalman filter
This content is taken from [1, Chapter 12].
1
![Page 4: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/4.jpg)
Introduction
Today we will cover
• Discrete time control and the Bellman equations
• Continuous time control and the Hamilton-Jacobi-Bellman equations
• An important special case: Linear-Quadratic Gaussian and
Linear-Quadratic Regulator problems
• Pontryagin’s maximum principle
• (time allowing) Optimal Estimation and Kalman filter
This content is taken from [1, Chapter 12].
1
![Page 5: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/5.jpg)
Introduction
Today we will cover
• Discrete time control and the Bellman equations
• Continuous time control and the Hamilton-Jacobi-Bellman equations
• An important special case: Linear-Quadratic Gaussian and
Linear-Quadratic Regulator problems
• Pontryagin’s maximum principle
• (time allowing) Optimal Estimation and Kalman filter
This content is taken from [1, Chapter 12].
1
![Page 6: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/6.jpg)
Introduction
Today we will cover
• Discrete time control and the Bellman equations
• Continuous time control and the Hamilton-Jacobi-Bellman equations
• An important special case: Linear-Quadratic Gaussian and
Linear-Quadratic Regulator problems
• Pontryagin’s maximum principle
• (time allowing) Optimal Estimation and Kalman filter
This content is taken from [1, Chapter 12].
1
![Page 7: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/7.jpg)
Introduction
Today we will cover
• Discrete time control and the Bellman equations
• Continuous time control and the Hamilton-Jacobi-Bellman equations
• An important special case: Linear-Quadratic Gaussian and
Linear-Quadratic Regulator problems
• Pontryagin’s maximum principle
• (time allowing) Optimal Estimation and Kalman filter
This content is taken from [1, Chapter 12].
1
![Page 8: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/8.jpg)
Discrete control and the Bellman equations
Define
• x ∈ X the state of the agent’s environment.
• u ∈ U(x) the action chosen at state x .
• next(x , u) ∈ X the resulting state from applying action u in state x
• cost(x , u) ≥ 0 the cost of applying u in state x
2
![Page 9: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/9.jpg)
Example: plane tickets
• X = set of cities
• U(x) = flights available from city x
• next(x , u) the city where the flight lands
• cost(x , u) price of the flight
Goal: find cheapest way to get to your destination
3
![Page 10: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/10.jpg)
Example: plane tickets
• X = set of cities
• U(x) = flights available from city x
• next(x , u) the city where the flight lands
• cost(x , u) price of the flight
Goal: find cheapest way to get to your destination
3
![Page 11: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/11.jpg)
Discrete Control and the Bellman Equations
Define
• x ∈ X the state of the agent’s environment.
• u ∈ U(x) the action chosen at state x .
• next(x , u) ∈ X the resulting state from applying action u in state x
• cost(x , u) ≥ 0 the cost of applying u in state x
Goal: find action sequence (u0, . . . , un−1) minimizing the total cost
J(x ., u.) =n−1∑k=0
cost(xk , uk)
where xk+1 = next(xk , uk), and x0 and xn given.
• We can think of this as a graph where nodes are states, and actions
are arrows connecting the nodes.
4
![Page 12: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/12.jpg)
Discrete Control and the Bellman Equations
Define
• x ∈ X the state of the agent’s environment.
• u ∈ U(x) the action chosen at state x .
• next(x , u) ∈ X the resulting state from applying action u in state x
• cost(x , u) ≥ 0 the cost of applying u in state x
Goal: find action sequence (u0, . . . , un−1) minimizing the total cost
J(x ., u.) =n−1∑k=0
cost(xk , uk)
where xk+1 = next(xk , uk), and x0 and xn given.
• We can think of this as a graph where nodes are states, and actions
are arrows connecting the nodes.
4
![Page 13: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/13.jpg)
Discrete Control and the Bellman Equations
Goal: find action sequence (u0, . . . , un−1) minimizing the total cost
J(x ., u.) =n−1∑k=0
cost(xk , uk)
We need a control law, namely a mapping from states to actions.
Defining the optimal value function as
v(x) = minu∈U(x)
{cost(x , u) + v(next(x , u))
}(1)
the associated optimal control law is
π(x) = arg minu∈U(x)
{cost(x , u) + v(next(x , u))
}(2)
Those are the Bellman equations.
5
![Page 14: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/14.jpg)
Discrete Control and the Bellman Equations
Goal: find action sequence (u0, . . . , un−1) minimizing the total cost
J(x ., u.) =n−1∑k=0
cost(xk , uk)
We need a control law, namely a mapping from states to actions.
Defining the optimal value function as
v(x) = minu∈U(x)
{cost(x , u) + v(next(x , u))
}(1)
the associated optimal control law is
π(x) = arg minu∈U(x)
{cost(x , u) + v(next(x , u))
}(2)
Those are the Bellman equations.
5
![Page 15: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/15.jpg)
Discrete Control and the Bellman Equations
Goal: find action sequence (u0, . . . , un−1) minimizing the total cost
J(x ., u.) =n−1∑k=0
cost(xk , uk)
We need a control law, namely a mapping from states to actions.
Defining the optimal value function as
v(x) = minu∈U(x)
{cost(x , u) + v(next(x , u))
}(1)
the associated optimal control law is
π(x) = arg minu∈U(x)
{cost(x , u) + v(next(x , u))
}(2)
Those are the Bellman equations.
5
![Page 16: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/16.jpg)
Discrete Control and the Bellman Equations
Goal: find action sequence (u0, . . . , un−1) minimizing the total cost
J(x ., u.) =n−1∑k=0
cost(xk , uk)
We need a control law, namely a mapping from states to actions.
Defining the optimal value function as
v(x) = minu∈U(x)
{cost(x , u) + v(next(x , u))
}(1)
the associated optimal control law is
π(x) = arg minu∈U(x)
{cost(x , u) + v(next(x , u))
}(2)
Those are the Bellman equations.
5
![Page 17: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/17.jpg)
Discrete Control and the Bellman Equations
We want to solve
v(x) = minu∈U(x)
{cost(x , u) + v(next(x , u))
}
π(x) = arg minu∈U(x)
{cost(x , u) + v(next(x , u))
}
Let’s go back to the graph analogy. Assume the graph is acyclic.
Suppose we start at x0 and want to reach xf .
• set v(xf ) = 0
• once every successor of a state x has been visited, apply the formula
for v to x .
6
![Page 18: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/18.jpg)
Discrete Control and the Bellman Equations
We want to solve
v(x) = minu∈U(x)
{cost(x , u) + v(next(x , u))
}
π(x) = arg minu∈U(x)
{cost(x , u) + v(next(x , u))
}Let’s go back to the graph analogy. Assume the graph is acyclic.
Suppose we start at x0 and want to reach xf .
• set v(xf ) = 0
• once every successor of a state x has been visited, apply the formula
for v to x .
6
![Page 19: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/19.jpg)
Discrete Control and the Bellman Equations
We want to solve
v(x) = minu∈U(x)
{cost(x , u) + v(next(x , u))
}
π(x) = arg minu∈U(x)
{cost(x , u) + v(next(x , u))
}Let’s go back to the graph analogy. Assume the graph is acyclic.
Suppose we start at x0 and want to reach xf .
• set v(xf ) = 0
• once every successor of a state x has been visited, apply the formula
for v to x .
6
![Page 20: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/20.jpg)
Discrete Control and the Bellman Equations
• For cyclic graphs, this approach will not work.
• The Bellman equations are still valid.
• need to design iterative schemes: Value Iteration and Policy
Iteration
7
![Page 21: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/21.jpg)
Discrete Control and the Bellman Equations
• For cyclic graphs, this approach will not work.
• The Bellman equations are still valid.
• need to design iterative schemes: Value Iteration and Policy
Iteration
7
![Page 22: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/22.jpg)
Discrete Control and the Bellman Equations
• For cyclic graphs, this approach will not work.
• The Bellman equations are still valid.
• need to design iterative schemes: Value Iteration and Policy
Iteration
7
![Page 23: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/23.jpg)
Discrete Control and the Bellman Equations
Value Iteration proceeds as follows:
• start with some guess v (0) of the optimal value function.
• construct a sequence of guesses
v (i+1)(x) = minu∈U(x)
{cost(x , u) + v (i)(next(x , u))
}
This algorithm can be shown to converge at a linear rate [2].
Each iteration costs O(|X ||U|).
8
![Page 24: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/24.jpg)
Discrete Control and the Bellman Equations
Value Iteration proceeds as follows:
• start with some guess v (0) of the optimal value function.
• construct a sequence of guesses
v (i+1)(x) = minu∈U(x)
{cost(x , u) + v (i)(next(x , u))
}
This algorithm can be shown to converge at a linear rate [2].
Each iteration costs O(|X ||U|).
8
![Page 25: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/25.jpg)
Discrete Control and the Bellman Equations
Value Iteration proceeds as follows:
• start with some guess v (0) of the optimal value function.
• construct a sequence of guesses
v (i+1)(x) = minu∈U(x)
{cost(x , u) + v (i)(next(x , u))
}
This algorithm can be shown to converge at a linear rate [2].
Each iteration costs O(|X ||U|).
8
![Page 26: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/26.jpg)
Discrete Control and the Bellman Equations
Value Iteration proceeds as follows:
• start with some guess v (0) of the optimal value function.
• construct a sequence of guesses
v (i+1)(x) = minu∈U(x)
{cost(x , u) + v (i)(next(x , u))
}
This algorithm can be shown to converge at a linear rate [2].
Each iteration costs O(|X ||U|).
8
![Page 27: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/27.jpg)
Discrete Control and the Bellman Equations
Policy Iteration proceeds as follows:
• start with some guess π(0) of the optimal law.
• construct a sequence of guesses
vπ(i)
(x) = cost(x , π(i)(x)) + vπ(i)
(next(x , π(i)(x))
π(i+1)(x) = arg minu∈U(x)
{cost(x , u) + vπ
(i)
(next(x , u))}
Need to relax the first line or solve a system of linear equations.
Under certain assumptions, this is faster than value iteration [3].
However each iteration is more costly.
9
![Page 28: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/28.jpg)
Discrete Control and the Bellman Equations
Policy Iteration proceeds as follows:
• start with some guess π(0) of the optimal law.
• construct a sequence of guesses
vπ(i)
(x) = cost(x , π(i)(x)) + vπ(i)
(next(x , π(i)(x))
π(i+1)(x) = arg minu∈U(x)
{cost(x , u) + vπ
(i)
(next(x , u))}
Need to relax the first line or solve a system of linear equations.
Under certain assumptions, this is faster than value iteration [3].
However each iteration is more costly.
9
![Page 29: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/29.jpg)
Discrete Control and the Bellman Equations
Policy Iteration proceeds as follows:
• start with some guess π(0) of the optimal law.
• construct a sequence of guesses
vπ(i)
(x) = cost(x , π(i)(x)) + vπ(i)
(next(x , π(i)(x))
π(i+1)(x) = arg minu∈U(x)
{cost(x , u) + vπ
(i)
(next(x , u))}
Need to relax the first line or solve a system of linear equations.
Under certain assumptions, this is faster than value iteration [3].
However each iteration is more costly.
9
![Page 30: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/30.jpg)
Discrete Control and the Bellman Equations
Policy Iteration proceeds as follows:
• start with some guess π(0) of the optimal law.
• construct a sequence of guesses
vπ(i)
(x) = cost(x , π(i)(x)) + vπ(i)
(next(x , π(i)(x))
π(i+1)(x) = arg minu∈U(x)
{cost(x , u) + vπ
(i)
(next(x , u))}
Need to relax the first line or solve a system of linear equations.
Under certain assumptions, this is faster than value iteration [3].
However each iteration is more costly.
9
![Page 31: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/31.jpg)
Discrete Control and the Bellman Equations
Policy Iteration proceeds as follows:
• start with some guess π(0) of the optimal law.
• construct a sequence of guesses
vπ(i)
(x) = cost(x , π(i)(x)) + vπ(i)
(next(x , π(i)(x))
π(i+1)(x) = arg minu∈U(x)
{cost(x , u) + vπ
(i)
(next(x , u))}
Need to relax the first line or solve a system of linear equations.
Under certain assumptions, this is faster than value iteration [3].
However each iteration is more costly.
9
![Page 32: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/32.jpg)
Discrete Control and the Bellman equations
• It is also of interest to consider the stochastic setting where we have
p(y | x , u) = ”probability that next(x , u) = y”
• The Bellman equation for the optimal control law becomes
π(x) = arg minu∈U(x)
{cost(x , u) + E
[v(next(x , u))
]}
• everything we have seen so far generalizes to this setting.
• This is called a Markov Decision Process (MDP)
10
![Page 33: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/33.jpg)
Discrete Control and the Bellman equations
• It is also of interest to consider the stochastic setting where we have
p(y | x , u) = ”probability that next(x , u) = y”
• The Bellman equation for the optimal control law becomes
π(x) = arg minu∈U(x)
{cost(x , u) + E
[v(next(x , u))
]}
• everything we have seen so far generalizes to this setting.
• This is called a Markov Decision Process (MDP)
10
![Page 34: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/34.jpg)
Discrete Control and the Bellman equations
• It is also of interest to consider the stochastic setting where we have
p(y | x , u) = ”probability that next(x , u) = y”
• The Bellman equation for the optimal control law becomes
π(x) = arg minu∈U(x)
{cost(x , u) + E
[v(next(x , u))
]}
• everything we have seen so far generalizes to this setting.
• This is called a Markov Decision Process (MDP)
10
![Page 35: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/35.jpg)
Discrete Control and the Bellman equations
• It is also of interest to consider the stochastic setting where we have
p(y | x , u) = ”probability that next(x , u) = y”
• The Bellman equation for the optimal control law becomes
π(x) = arg minu∈U(x)
{cost(x , u) + E
[v(next(x , u))
]}
• everything we have seen so far generalizes to this setting.
• This is called a Markov Decision Process (MDP)
10
![Page 36: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/36.jpg)
Continuous Control
• State x ∈ Rnx and actions u ∈ U(x) ⊂ Rnu are real-valued vectors.
• Assume that our trajectory is given by
dx = f (x , u)dt + F (x , u)dw
where dw is nw -dimensional Brownian motion. We can also write
the previous as
x(t) = x(0) +
∫ t
0
f (x(s), u(s))ds +
∫ t
0
F (x(s), u(s))dw(s)
• Discretizing this into time steps of size ∆, i.e. t = k∆, gives
xk+1 = xk + ∆f (xk , uk) +√
∆F (xk , uk)εk (3)
where εk ∼ N (0, Inw ) and xk = x(k∆).
11
![Page 37: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/37.jpg)
Continuous Control
• State x ∈ Rnx and actions u ∈ U(x) ⊂ Rnu are real-valued vectors.
• Assume that our trajectory is given by
dx = f (x , u)dt + F (x , u)dw
where dw is nw -dimensional Brownian motion. We can also write
the previous as
x(t) = x(0) +
∫ t
0
f (x(s), u(s))ds +
∫ t
0
F (x(s), u(s))dw(s)
• Discretizing this into time steps of size ∆, i.e. t = k∆, gives
xk+1 = xk + ∆f (xk , uk) +√
∆F (xk , uk)εk (3)
where εk ∼ N (0, Inw ) and xk = x(k∆).
11
![Page 38: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/38.jpg)
Continuous Control
• State x ∈ Rnx and actions u ∈ U(x) ⊂ Rnu are real-valued vectors.
• Assume that our trajectory is given by
dx = f (x , u)dt + F (x , u)dw
where dw is nw -dimensional Brownian motion. We can also write
the previous as
x(t) = x(0) +
∫ t
0
f (x(s), u(s))ds +
∫ t
0
F (x(s), u(s))dw(s)
• Discretizing this into time steps of size ∆, i.e. t = k∆, gives
xk+1 = xk + ∆f (xk , uk) +√
∆F (xk , uk)εk (3)
where εk ∼ N (0, Inw ) and xk = x(k∆).
11
![Page 39: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/39.jpg)
Continuous Control
• We also need a cost function.
• For now assume finite horizon-problems, i.e. a final time tf is
specified.
• Separate the total cost into cost rate ` and final cost h.
• Total cost is then
J(x ., u.) = h(x(tf )) +
∫ tf
0
`(x(t), u(t), t)dt
• Discretizing this gives
J(x ., u.) = h(xn) + ∆n−1∑k=0
`(xk , uk , k∆) (4)
where n = tf /∆.
12
![Page 40: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/40.jpg)
Continuous Control
• We also need a cost function.
• For now assume finite horizon-problems, i.e. a final time tf is
specified.
• Separate the total cost into cost rate ` and final cost h.
• Total cost is then
J(x ., u.) = h(x(tf )) +
∫ tf
0
`(x(t), u(t), t)dt
• Discretizing this gives
J(x ., u.) = h(xn) + ∆n−1∑k=0
`(xk , uk , k∆) (4)
where n = tf /∆.
12
![Page 41: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/41.jpg)
Continuous Control
• We also need a cost function.
• For now assume finite horizon-problems, i.e. a final time tf is
specified.
• Separate the total cost into cost rate ` and final cost h.
• Total cost is then
J(x ., u.) = h(x(tf )) +
∫ tf
0
`(x(t), u(t), t)dt
• Discretizing this gives
J(x ., u.) = h(xn) + ∆n−1∑k=0
`(xk , uk , k∆) (4)
where n = tf /∆.
12
![Page 42: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/42.jpg)
Continuous Control
• We also need a cost function.
• For now assume finite horizon-problems, i.e. a final time tf is
specified.
• Separate the total cost into cost rate ` and final cost h.
• Total cost is then
J(x ., u.) = h(x(tf )) +
∫ tf
0
`(x(t), u(t), t)dt
• Discretizing this gives
J(x ., u.) = h(xn) + ∆n−1∑k=0
`(xk , uk , k∆) (4)
where n = tf /∆.
12
![Page 43: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/43.jpg)
Continuous Control
• To summarize, we have
xk+1 = xk + ∆f (xk , uk) +√
∆F (xk , uk)εk (5)
with εk ∼ N (0, Inw ), and
J(x ., u.) = h(xn) + ∆n−1∑k=0
`(xk , uk , k∆) (6)
• From (5) we can see that
xk+1 = xk + ∆f (xk , uk) + ξ
where ξ ∼ N (0,∆S(xk , uk)) and S(x , u) = F (x , u)F (x , u)T .
• With this we can define the optimal value function similarly
v(x , k) = minu
{∆`(x , u, k∆) + E
[v(x + ∆f (x , u) + ξ, k + 1
)]}(7)
13
![Page 44: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/44.jpg)
Continuous Control
• To summarize, we have
xk+1 = xk + ∆f (xk , uk) +√
∆F (xk , uk)εk (5)
with εk ∼ N (0, Inw ), and
J(x ., u.) = h(xn) + ∆n−1∑k=0
`(xk , uk , k∆) (6)
• From (5) we can see that
xk+1 = xk + ∆f (xk , uk) + ξ
where ξ ∼ N (0,∆S(xk , uk)) and S(x , u) = F (x , u)F (x , u)T .
• With this we can define the optimal value function similarly
v(x , k) = minu
{∆`(x , u, k∆) + E
[v(x + ∆f (x , u) + ξ, k + 1
)]}(7)
13
![Page 45: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/45.jpg)
Continuous Control
• To summarize, we have
xk+1 = xk + ∆f (xk , uk) +√
∆F (xk , uk)εk (5)
with εk ∼ N (0, Inw ), and
J(x ., u.) = h(xn) + ∆n−1∑k=0
`(xk , uk , k∆) (6)
• From (5) we can see that
xk+1 = xk + ∆f (xk , uk) + ξ
where ξ ∼ N (0,∆S(xk , uk)) and S(x , u) = F (x , u)F (x , u)T .
• With this we can define the optimal value function similarly
v(x , k) = minu
{∆`(x , u, k∆) + E
[v(x + ∆f (x , u) + ξ, k + 1
)]}(7)
13
![Page 46: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/46.jpg)
Continuous Control
• We will simplify E[v(x + ∆f (x , u) + ξ)
].
• Setting δ = ∆f (x , u) + ξ, Taylor expansion gives
v(x + δ) = v(x) + δT vx(x) +1
2δT vxx(x)δ + o(δ3)
• Then
E[v(x + δ)] = v(x) + ∆f (x , u)T vx(x) +1
2E[ξT vxx(x)ξ
]+ o(∆2)
• Now,
E[ξT vxxξ
]= E
[tr(ξT vxxξ)
]= E
[tr(ξξT vxx)
]= tr
(Cov[ξ]vxx
)= tr
(∆Svxx
)14
![Page 47: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/47.jpg)
Continuous control
Going back to
v(x , k) = minu
{∆`(x , u, k∆) + E
[v(x + ∆f (x , u) + ξ, k + 1
)]}and with
E[v(x + δ)] = v(x) + ∆f (x , u)T vx(x) +1
2tr(∆S(x , u)vxx(x)
)+ o(∆2)
we get
v(x , k)− v(x , k + 1)
∆= min
u
{`+ f T vx +
1
2tr(Svxx)
}
15
![Page 48: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/48.jpg)
Continuous Control
v(x , k)− v(x , k + 1)
∆= min
u
{`+ f T vx +
1
2tr(Svxx)
}and recall that k in v(x , k) represents time k∆, so that the LHS is
v(x , t)− v(x , t + ∆)
∆
As ∆→ 0, this is − ∂∂t v , which we denote −vt . So for v(x , tf ) = h(x)
and 0 ≤ t ≤ tf , we have
−vt(x , t) = minu
{`(x , u, t) + f (x , u)T vx(x) +
1
2tr(S(x , u)vxx(x))
}(8)
and the associated optimal control law
π(x , t) = arg minu
{`(x , u, t) + f (x , u)T vx(x) +
1
2tr(S(x , u)vxx(x))
}(9)
Those are the Hamilton-Jacobi-Bellman (HJB) equations.
16
![Page 49: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/49.jpg)
Continuous Control
v(x , k)− v(x , k + 1)
∆= min
u
{`+ f T vx +
1
2tr(Svxx)
}and recall that k in v(x , k) represents time k∆, so that the LHS is
v(x , t)− v(x , t + ∆)
∆
As ∆→ 0, this is − ∂∂t v , which we denote −vt . So for v(x , tf ) = h(x)
and 0 ≤ t ≤ tf , we have
−vt(x , t) = minu
{`(x , u, t) + f (x , u)T vx(x) +
1
2tr(S(x , u)vxx(x))
}(8)
and the associated optimal control law
π(x , t) = arg minu
{`(x , u, t) + f (x , u)T vx(x) +
1
2tr(S(x , u)vxx(x))
}(9)
Those are the Hamilton-Jacobi-Bellman (HJB) equations.
16
![Page 50: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/50.jpg)
Continuous Control: solve the HJB Equations
−vt(x , t) = minu
{`(x , u, t) + f (x , u)T vx(x) +
1
2tr(S(x , u)vxx(x))
}with v(x , tf ) = h(x).
• Non-linear second-order PDE.
• May not have a classic solution
• numerical methods relying on ”viscosity” exist
• suffers from ”curse of dimensionality”
• several methods for approximate solutions exist and work well in
practice.
17
![Page 51: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/51.jpg)
Continuous Control: solve the HJB Equations
−vt(x , t) = minu
{`(x , u, t) + f (x , u)T vx(x) +
1
2tr(S(x , u)vxx(x))
}with v(x , tf ) = h(x).
• Non-linear second-order PDE.
• May not have a classic solution
• numerical methods relying on ”viscosity” exist
• suffers from ”curse of dimensionality”
• several methods for approximate solutions exist and work well in
practice.
17
![Page 52: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/52.jpg)
Continuous Control: solve the HJB Equations
−vt(x , t) = minu
{`(x , u, t) + f (x , u)T vx(x) +
1
2tr(S(x , u)vxx(x))
}with v(x , tf ) = h(x).
• Non-linear second-order PDE.
• May not have a classic solution
• numerical methods relying on ”viscosity” exist
• suffers from ”curse of dimensionality”
• several methods for approximate solutions exist and work well in
practice.
17
![Page 53: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/53.jpg)
Continuous Control: solve the HJB Equations
−vt(x , t) = minu
{`(x , u, t) + f (x , u)T vx(x) +
1
2tr(S(x , u)vxx(x))
}with v(x , tf ) = h(x).
• Non-linear second-order PDE.
• May not have a classic solution
• numerical methods relying on ”viscosity” exist
• suffers from ”curse of dimensionality”
• several methods for approximate solutions exist and work well in
practice.
17
![Page 54: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/54.jpg)
Continuous Control: solve the HJB Equations
−vt(x , t) = minu
{`(x , u, t) + f (x , u)T vx(x) +
1
2tr(S(x , u)vxx(x))
}with v(x , tf ) = h(x).
• Non-linear second-order PDE.
• May not have a classic solution
• numerical methods relying on ”viscosity” exist
• suffers from ”curse of dimensionality”
• several methods for approximate solutions exist and work well in
practice.
17
![Page 55: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/55.jpg)
Continuous Control: Infinite Horizon
Two infinite-horizon costs used in practice:
• Discounted cost formulation
J(x ., u.) =
∫ ∞0
exp(−αt)`(x(t), u(t))dt
• Average cost per stage formulation
J(x ., u.) = limtf→∞
1
tf
∫ tf
0
`(x(t), u(t))dt
Both those formulations bring similar HJB equations, except that they do
not depend on time.
In that sense they are easier to solve using numerical approximations.
However the finite-horizon problem also advantages.
18
![Page 56: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/56.jpg)
Continuous Control: Infinite Horizon
Two infinite-horizon costs used in practice:
• Discounted cost formulation
J(x ., u.) =
∫ ∞0
exp(−αt)`(x(t), u(t))dt
• Average cost per stage formulation
J(x ., u.) = limtf→∞
1
tf
∫ tf
0
`(x(t), u(t))dt
Both those formulations bring similar HJB equations, except that they do
not depend on time.
In that sense they are easier to solve using numerical approximations.
However the finite-horizon problem also advantages.
18
![Page 57: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/57.jpg)
Continuous Control: Infinite Horizon
Two infinite-horizon costs used in practice:
• Discounted cost formulation
J(x ., u.) =
∫ ∞0
exp(−αt)`(x(t), u(t))dt
• Average cost per stage formulation
J(x ., u.) = limtf→∞
1
tf
∫ tf
0
`(x(t), u(t))dt
Both those formulations bring similar HJB equations, except that they do
not depend on time.
In that sense they are easier to solve using numerical approximations.
However the finite-horizon problem also advantages.
18
![Page 58: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/58.jpg)
Continuous Control: Infinite Horizon
Two infinite-horizon costs used in practice:
• Discounted cost formulation
J(x ., u.) =
∫ ∞0
exp(−αt)`(x(t), u(t))dt
• Average cost per stage formulation
J(x ., u.) = limtf→∞
1
tf
∫ tf
0
`(x(t), u(t))dt
Both those formulations bring similar HJB equations, except that they do
not depend on time.
In that sense they are easier to solve using numerical approximations.
However the finite-horizon problem also advantages.
18
![Page 59: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/59.jpg)
Continuous Control: Infinite Horizon
Two infinite-horizon costs used in practice:
• Discounted cost formulation
J(x ., u.) =
∫ ∞0
exp(−αt)`(x(t), u(t))dt
• Average cost per stage formulation
J(x ., u.) = limtf→∞
1
tf
∫ tf
0
`(x(t), u(t))dt
Both those formulations bring similar HJB equations, except that they do
not depend on time.
In that sense they are easier to solve using numerical approximations.
However the finite-horizon problem also advantages.
18
![Page 60: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/60.jpg)
Linear-Quadratic-Gaussian control
• An important class of optimal control problems
• unlike many other problems, it is possible to find a closed-form
formula
• we will derive solutions in both the continuous and discrete cases
19
![Page 61: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/61.jpg)
LQG: the Continuous Case
We make the following assumptions
• dynamics: dx = (Ax + Bu)dt + Fdw
• cost rate: `(x , u) = 12u
TRu + 12x
TQx
• final cost: h(x) = 12x
TQ f x
where R,Q and Q f are symmetric, R is positive definite, and set
S = FFT .
Recall the HJB equation
−vt(x , t) = minu
{`(x , u, t) + f (x , u)T vx(x) +
1
2tr(S(x , u)vxx(x))
}with v(x , tf ) = h(x).
In our case it reads
−vt(x , t) = minu
{1
2uTRu +
1
2xTQx + (Ax + Bu)T vx(x) +
1
2tr(Svxx(x))
}with v(x , tf ) = 1
2xTQ f x
20
![Page 62: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/62.jpg)
LQG: the Continuous Case
We make the following assumptions
• dynamics: dx = (Ax + Bu)dt + Fdw
• cost rate: `(x , u) = 12u
TRu + 12x
TQx
• final cost: h(x) = 12x
TQ f x
where R,Q and Q f are symmetric, R is positive definite, and set
S = FFT .
Recall the HJB equation
−vt(x , t) = minu
{`(x , u, t) + f (x , u)T vx(x) +
1
2tr(S(x , u)vxx(x))
}with v(x , tf ) = h(x).
In our case it reads
−vt(x , t) = minu
{1
2uTRu +
1
2xTQx + (Ax + Bu)T vx(x) +
1
2tr(Svxx(x))
}with v(x , tf ) = 1
2xTQ f x
20
![Page 63: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/63.jpg)
LQG: the Continuous Case
We make the following assumptions
• dynamics: dx = (Ax + Bu)dt + Fdw
• cost rate: `(x , u) = 12u
TRu + 12x
TQx
• final cost: h(x) = 12x
TQ f x
where R,Q and Q f are symmetric, R is positive definite, and set
S = FFT .
Recall the HJB equation
−vt(x , t) = minu
{`(x , u, t) + f (x , u)T vx(x) +
1
2tr(S(x , u)vxx(x))
}with v(x , tf ) = h(x).
In our case it reads
−vt(x , t) = minu
{1
2uTRu +
1
2xTQx + (Ax + Bu)T vx(x) +
1
2tr(Svxx(x))
}with v(x , tf ) = 1
2xTQ f x
20
![Page 64: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/64.jpg)
LQG: the Continous Case
−vt(x , t) = minu
{1
2uTRu +
1
2xTQx + (Ax + Bu)T vx(x) +
1
2tr(Svxx(x))
}
• We make the following guess: v(x , t) = 12x
TV (t)x + a(t)
• the derivatives in the HJB equations are
• vt(x , t) =12xT V (t)x + a(t)
• vx(x) = V (t)x
• vxx(x) = V (t)
21
![Page 65: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/65.jpg)
LQG: the Continuous Case
Plugging back into the HJB equation gives
−vt(x , t) = minu
{1
2uTRu +
1
2xTQx + (Ax + Bu)TV (t)x +
1
2tr(SV (t))
}
This is simply a quadratic in u, whose minimizer is
u∗ = −R−1BTV (t)x
and thus
−vt(x , t) =1
2xT(Q + ATV (t) + V (t)A− V (t)BR−1BTV (t)
)x +
1
2tr(SV(t))
Because vt(x , t) = 12x
T V (t)x + a(t), this gives
−V (t) = Q + ATV (t) + V (t)A− V (t)BR−1BTV (t) (10)
−a(t) =1
2tr(SV(t))
This is a continuous-time Riccati equation.
22
![Page 66: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/66.jpg)
LQG: the Continuous Case
Plugging back into the HJB equation gives
−vt(x , t) = minu
{1
2uTRu +
1
2xTQx + (Ax + Bu)TV (t)x +
1
2tr(SV (t))
}This is simply a quadratic in u, whose minimizer is
u∗ = −R−1BTV (t)x
and thus
−vt(x , t) =1
2xT(Q + ATV (t) + V (t)A− V (t)BR−1BTV (t)
)x +
1
2tr(SV(t))
Because vt(x , t) = 12x
T V (t)x + a(t), this gives
−V (t) = Q + ATV (t) + V (t)A− V (t)BR−1BTV (t) (10)
−a(t) =1
2tr(SV(t))
This is a continuous-time Riccati equation.
22
![Page 67: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/67.jpg)
LQG: the Continuous Case
Plugging back into the HJB equation gives
−vt(x , t) = minu
{1
2uTRu +
1
2xTQx + (Ax + Bu)TV (t)x +
1
2tr(SV (t))
}This is simply a quadratic in u, whose minimizer is
u∗ = −R−1BTV (t)x
and thus
−vt(x , t) =1
2xT(Q + ATV (t) + V (t)A− V (t)BR−1BTV (t)
)x +
1
2tr(SV(t))
Because vt(x , t) = 12x
T V (t)x + a(t), this gives
−V (t) = Q + ATV (t) + V (t)A− V (t)BR−1BTV (t) (10)
−a(t) =1
2tr(SV(t))
This is a continuous-time Riccati equation.
22
![Page 68: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/68.jpg)
LQG: the Continuous Case
Plugging back into the HJB equation gives
−vt(x , t) = minu
{1
2uTRu +
1
2xTQx + (Ax + Bu)TV (t)x +
1
2tr(SV (t))
}This is simply a quadratic in u, whose minimizer is
u∗ = −R−1BTV (t)x
and thus
−vt(x , t) =1
2xT(Q + ATV (t) + V (t)A− V (t)BR−1BTV (t)
)x +
1
2tr(SV(t))
Because vt(x , t) = 12x
T V (t)x + a(t), this gives
−V (t) = Q + ATV (t) + V (t)A− V (t)BR−1BTV (t) (10)
−a(t) =1
2tr(SV(t))
This is a continuous-time Riccati equation.
22
![Page 69: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/69.jpg)
LQG: the Continuous Case
−V (t) = Q + ATV (t) + V (t)A− V (t)BR−1BTV (t)
−a(t) =1
2tr(SV(t))
The boundary conditions v(x , tf ) = 12x
TQ f x imply that V (tf ) = Q f and
a(tf ) = 0.
⇒ This is a simple ODE, which is easy to solve.
The optimal control law is given by
u∗ = −R−1BTV (t)x
• It does not depend on the noise.
• It remains the same in the deterministic case, called the
linear-quadratic regulator.
23
![Page 70: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/70.jpg)
LQG: the Continuous Case
−V (t) = Q + ATV (t) + V (t)A− V (t)BR−1BTV (t)
−a(t) =1
2tr(SV(t))
The boundary conditions v(x , tf ) = 12x
TQ f x imply that V (tf ) = Q f and
a(tf ) = 0.
⇒ This is a simple ODE, which is easy to solve.
The optimal control law is given by
u∗ = −R−1BTV (t)x
• It does not depend on the noise.
• It remains the same in the deterministic case, called the
linear-quadratic regulator.
23
![Page 71: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/71.jpg)
LQG: the Continuous Case
−V (t) = Q + ATV (t) + V (t)A− V (t)BR−1BTV (t)
−a(t) =1
2tr(SV(t))
The boundary conditions v(x , tf ) = 12x
TQ f x imply that V (tf ) = Q f and
a(tf ) = 0.
⇒ This is a simple ODE, which is easy to solve.
The optimal control law is given by
u∗ = −R−1BTV (t)x
• It does not depend on the noise.
• It remains the same in the deterministic case, called the
linear-quadratic regulator.
23
![Page 72: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/72.jpg)
LQG: the Continuous Case
−V (t) = Q + ATV (t) + V (t)A− V (t)BR−1BTV (t)
−a(t) =1
2tr(SV(t))
The boundary conditions v(x , tf ) = 12x
TQ f x imply that V (tf ) = Q f and
a(tf ) = 0.
⇒ This is a simple ODE, which is easy to solve.
The optimal control law is given by
u∗ = −R−1BTV (t)x
• It does not depend on the noise.
• It remains the same in the deterministic case, called the
linear-quadratic regulator.
23
![Page 73: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/73.jpg)
LQR: the Discrete Case
We make the following assumptions
• dynamics: xk+1 = Axk + Buk
• cost: `(xk , uk) = 12u
Tk Ruk + 1
2xTk Qxk
• final cost: h(xn) = 12x
Tn Q f xn
where R,Q and Q f are symmetric, R is positive definite, and set
S = FFT .
Recall the Bellman equation
v(x , k) = minu
{`(x .u, k) + v(next(x , u, k))
}with v(xn) = h(xn).
Again we make the assumption that
v(x , k) =1
2xTVkx
24
![Page 74: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/74.jpg)
LQR: the Discrete Case
We make the following assumptions
• dynamics: xk+1 = Axk + Buk
• cost: `(xk , uk) = 12u
Tk Ruk + 1
2xTk Qxk
• final cost: h(xn) = 12x
Tn Q f xn
where R,Q and Q f are symmetric, R is positive definite, and set
S = FFT .
Recall the Bellman equation
v(x , k) = minu
{`(x .u, k) + v(next(x , u, k))
}with v(xn) = h(xn).
Again we make the assumption that
v(x , k) =1
2xTVkx
24
![Page 75: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/75.jpg)
LQR: the Discrete Case
We make the following assumptions
• dynamics: xk+1 = Axk + Buk
• cost: `(xk , uk) = 12u
Tk Ruk + 1
2xTk Qxk
• final cost: h(xn) = 12x
Tn Q f xn
where R,Q and Q f are symmetric, R is positive definite, and set
S = FFT .
Recall the Bellman equation
v(x , k) = minu
{`(x .u, k) + v(next(x , u, k))
}with v(xn) = h(xn).
Again we make the assumption that
v(x , k) =1
2xTVkx
24
![Page 76: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/76.jpg)
LQR: the Discrete Case
The boundary constraint gives Vn = Q f .
Plugging everything gives
1
2xTVkx = min
u
{1
2uTRu +
1
2xTQx +
1
2(Ax + Bu)TVk+1(Ax + Bu)
}
This is simply a quadratic in u, and we get
Vk = Q + ATVk+1A− ATVk+1B(R + BTVk+1B)−1BTVk+1A
which is a discrete-time Ricatti equation and the associated optimal
control law
uk = −Lkxkwhere Lk = (R + BTVk+1B)−1Vk+1A
25
![Page 77: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/77.jpg)
LQR: the Discrete Case
The boundary constraint gives Vn = Q f .
Plugging everything gives
1
2xTVkx = min
u
{1
2uTRu +
1
2xTQx +
1
2(Ax + Bu)TVk+1(Ax + Bu)
}This is simply a quadratic in u, and we get
Vk = Q + ATVk+1A− ATVk+1B(R + BTVk+1B)−1BTVk+1A
which is a discrete-time Ricatti equation and the associated optimal
control law
uk = −Lkxkwhere Lk = (R + BTVk+1B)−1Vk+1A
25
![Page 78: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/78.jpg)
LQR: the Discrete Case
Vk = Q + ATVk+1A− ATVk+1B(R + BTVk+1B)−1BTVk+1A
• Start with Vn = Q f and iterate backwards
• Can be computed offline
26
![Page 79: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/79.jpg)
Deterministic Control: Pontryagin’s Maximum Principle
• Another approach to optimal control theory
• developped in the Soviet Union by Pontryagin
• only applies for deterministic problems.
• avoids the curse of dimensionality.
• applies for both continuous and discrete time.
27
![Page 80: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/80.jpg)
Pontryagin’s Maximum Principle: The Continuous Case
Setting:
• dynamics: dx = f (x(t), u(t))dt
• cost rate: `(x(t), u(t), t)
• final cost: h(x(tf ))
with fixed x0 and final time tf .
Recall the HJB equation
−vt(x , t) = minu
{`(x , u, t) + f (x , u)T vx(x) +
1
2tr(S(x , u)vxx(x))
}Because we are in the deterministic case we have
−vt(x , t) = minu
{`(x , u, t) + f (x , u)T vx(x)}
Suppose optimal control law is given by u = π(x , t)
28
![Page 81: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/81.jpg)
Pontryagin’s Maximum Principle: The Continuous Case
Setting:
• dynamics: dx = f (x(t), u(t))dt
• cost rate: `(x(t), u(t), t)
• final cost: h(x(tf ))
with fixed x0 and final time tf .
Recall the HJB equation
−vt(x , t) = minu
{`(x , u, t) + f (x , u)T vx(x) +
1
2tr(S(x , u)vxx(x))
}
Because we are in the deterministic case we have
−vt(x , t) = minu
{`(x , u, t) + f (x , u)T vx(x)}
Suppose optimal control law is given by u = π(x , t)
28
![Page 82: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/82.jpg)
Pontryagin’s Maximum Principle: The Continuous Case
Setting:
• dynamics: dx = f (x(t), u(t))dt
• cost rate: `(x(t), u(t), t)
• final cost: h(x(tf ))
with fixed x0 and final time tf .
Recall the HJB equation
−vt(x , t) = minu
{`(x , u, t) + f (x , u)T vx(x) +
1
2tr(S(x , u)vxx(x))
}Because we are in the deterministic case we have
−vt(x , t) = minu
{`(x , u, t) + f (x , u)T vx(x)}
Suppose optimal control law is given by u = π(x , t)
28
![Page 83: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/83.jpg)
Pontryagin’s Maximum Principle: The Continuous Case
Setting:
• dynamics: dx = f (x(t), u(t))dt
• cost rate: `(x(t), u(t), t)
• final cost: h(x(tf ))
with fixed x0 and final time tf .
Recall the HJB equation
−vt(x , t) = minu
{`(x , u, t) + f (x , u)T vx(x) +
1
2tr(S(x , u)vxx(x))
}Because we are in the deterministic case we have
−vt(x , t) = minu
{`(x , u, t) + f (x , u)T vx(x)}
Suppose optimal control law is given by u = π(x , t)
28
![Page 84: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/84.jpg)
Pontryagin’s Maximum Principle: The Continuous Case
−vt(x , t) = `(x , π(x , t), t) + f (x , π(x , t))T vx(x , t)
Taking derivatives w.r.t. x
0 = vtx + `x + πTx `u + f Tx vx + πT
x fTu vx + vxx f
Observe that vx = vxx x + vtx = vxx f + vtx ,
0 = vx + `x + f Tx vx + πTx (`u + f Tu vx)
Observe that `u + f Tu vx = `u(x , π(x , t), t) + fu(x , π(x , t))T vx(x , t)= 0
29
![Page 85: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/85.jpg)
Pontryagin’s Maximum Principle: The Continuous Case
−vt(x , t) = `(x , π(x , t), t) + f (x , π(x , t))T vx(x , t)
Taking derivatives w.r.t. x
0 = vtx + `x + πTx `u + f Tx vx + πT
x fTu vx + vxx f
Observe that vx = vxx x + vtx = vxx f + vtx ,
0 = vx + `x + f Tx vx + πTx (`u + f Tu vx)
Observe that `u + f Tu vx = `u(x , π(x , t), t) + fu(x , π(x , t))T vx(x , t)= 0
29
![Page 86: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/86.jpg)
Pontryagin’s Maximum Principle: The Continuous Case
−vt(x , t) = `(x , π(x , t), t) + f (x , π(x , t))T vx(x , t)
Taking derivatives w.r.t. x
0 = vtx + `x + πTx `u + f Tx vx + πT
x fTu vx + vxx f
Observe that vx = vxx x + vtx = vxx f + vtx ,
0 = vx + `x + f Tx vx + πTx (`u + f Tu vx)
Observe that `u + f Tu vx = `u(x , π(x , t), t) + fu(x , π(x , t))T vx(x , t)= 0
29
![Page 87: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/87.jpg)
Pontryagin’s Maximum Principle: The Continuous Case
−vt(x , t) = `(x , π(x , t), t) + f (x , π(x , t))T vx(x , t)
Taking derivatives w.r.t. x
0 = vtx + `x + πTx `u + f Tx vx + πT
x fTu vx + vxx f
Observe that vx = vxx x + vtx = vxx f + vtx ,
0 = vx + `x + f Tx vx + πTx (`u + f Tu vx)
Observe that `u + f Tu vx = `u(x , π(x , t), t) + fu(x , π(x , t))T vx(x , t)
= 0
29
![Page 88: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/88.jpg)
Pontryagin’s Maximum Principle: The Continuous Case
−vt(x , t) = `(x , π(x , t), t) + f (x , π(x , t))T vx(x , t)
Taking derivatives w.r.t. x
0 = vtx + `x + πTx `u + f Tx vx + πT
x fTu vx + vxx f
Observe that vx = vxx x + vtx = vxx f + vtx ,
0 = vx + `x + f Tx vx + πTx (`u + f Tu vx)
Observe that `u + f Tu vx = `u(x , π(x , t), t) + fu(x , π(x , t))T vx(x , t)= 0
29
![Page 89: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/89.jpg)
Pontryagin’s Maximum Principle: The Continuous Case
We then get
−vx(x , t) = fx(x , π(x , t))T vx(x , t) + `x(x , π(x , t), t)
Setting p = vx , this gives
−p(t) = fx(x , π(x , t))Tp(t) + `x(x , π(x , t), t)
The maximum principle thus reads
x(t) = f (x(t), u(t))
−p(t) = fx(x(t), u(t))Tp(t) + `x(x(t), u(t), t)
u(t) = arg minu
{`(x(t), u, t) + f (x(t), u)Tp(t)
}with boundary conditions p(tf ) = vx(x(tf ), tf ) = hx(x(tf )), and x0, tfgiven.
30
![Page 90: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/90.jpg)
Pontryagin’s Maximum Principle: The Continuous Case
We then get
−vx(x , t) = fx(x , π(x , t))T vx(x , t) + `x(x , π(x , t), t)
Setting p = vx , this gives
−p(t) = fx(x , π(x , t))Tp(t) + `x(x , π(x , t), t)
The maximum principle thus reads
x(t) = f (x(t), u(t))
−p(t) = fx(x(t), u(t))Tp(t) + `x(x(t), u(t), t)
u(t) = arg minu
{`(x(t), u, t) + f (x(t), u)Tp(t)
}with boundary conditions p(tf ) = vx(x(tf ), tf ) = hx(x(tf )), and x0, tfgiven.
30
![Page 91: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/91.jpg)
Pontryagin’s Maximum Principle: The Continuous Case
We then get
−vx(x , t) = fx(x , π(x , t))T vx(x , t) + `x(x , π(x , t), t)
Setting p = vx , this gives
−p(t) = fx(x , π(x , t))Tp(t) + `x(x , π(x , t), t)
The maximum principle thus reads
x(t) = f (x(t), u(t))
−p(t) = fx(x(t), u(t))Tp(t) + `x(x(t), u(t), t)
u(t) = arg minu
{`(x(t), u, t) + f (x(t), u)Tp(t)
}with boundary conditions p(tf ) = vx(x(tf ), tf ) = hx(x(tf )), and x0, tfgiven.
30
![Page 92: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/92.jpg)
Pontryagin’s Maximum Principle: The Continuous Case
Setting the Hamiltonian H(x , u, p, t) := `(x , u, t) + f (x , u)Tp, the
maximum principle reads
x(t) = f (x(t), u(t))
−p(t) = fx(x(t), u(t))Tp(t) + `x(x(t), u(t), t)
u(t) = arg minu
H(x(t), u, p(t), t)
with p(tf ) = hx(x(tf ))
• Simple ODE, cost grows linearly with nx
• existing software packages to solve
• Only issue is to solve for the Hamiltonian
• For problems where the dynamic is linear and the cost is quadratic
w,r.t. the control u, a nice closed form formula exists.
31
![Page 93: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/93.jpg)
Pontryagin’s Maximum Principle: The Continuous Case
Setting the Hamiltonian H(x , u, p, t) := `(x , u, t) + f (x , u)Tp, the
maximum principle reads
x(t) = f (x(t), u(t))
−p(t) = fx(x(t), u(t))Tp(t) + `x(x(t), u(t), t)
u(t) = arg minu
H(x(t), u, p(t), t)
with p(tf ) = hx(x(tf ))
• Simple ODE, cost grows linearly with nx
• existing software packages to solve
• Only issue is to solve for the Hamiltonian
• For problems where the dynamic is linear and the cost is quadratic
w,r.t. the control u, a nice closed form formula exists.
31
![Page 94: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/94.jpg)
Pontryagin’s Maximum Principle: The Continuous Case
Setting the Hamiltonian H(x , u, p, t) := `(x , u, t) + f (x , u)Tp, the
maximum principle reads
x(t) = f (x(t), u(t))
−p(t) = fx(x(t), u(t))Tp(t) + `x(x(t), u(t), t)
u(t) = arg minu
H(x(t), u, p(t), t)
with p(tf ) = hx(x(tf ))
• Simple ODE, cost grows linearly with nx
• existing software packages to solve
• Only issue is to solve for the Hamiltonian
• For problems where the dynamic is linear and the cost is quadratic
w,r.t. the control u, a nice closed form formula exists.
31
![Page 95: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/95.jpg)
Pontryagin’s Maximum Principle: The Continuous Case
Setting the Hamiltonian H(x , u, p, t) := `(x , u, t) + f (x , u)Tp, the
maximum principle reads
x(t) = f (x(t), u(t))
−p(t) = fx(x(t), u(t))Tp(t) + `x(x(t), u(t), t)
u(t) = arg minu
H(x(t), u, p(t), t)
with p(tf ) = hx(x(tf ))
• Simple ODE, cost grows linearly with nx
• existing software packages to solve
• Only issue is to solve for the Hamiltonian
• For problems where the dynamic is linear and the cost is quadratic
w,r.t. the control u, a nice closed form formula exists.
31
![Page 96: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/96.jpg)
Pontryagin’s Maximum Principle: The Continuous Case
Setting the Hamiltonian H(x , u, p, t) := `(x , u, t) + f (x , u)Tp, the
maximum principle reads
x(t) = f (x(t), u(t))
−p(t) = fx(x(t), u(t))Tp(t) + `x(x(t), u(t), t)
u(t) = arg minu
H(x(t), u, p(t), t)
with p(tf ) = hx(x(tf ))
• Simple ODE, cost grows linearly with nx
• existing software packages to solve
• Only issue is to solve for the Hamiltonian
• For problems where the dynamic is linear and the cost is quadratic
w,r.t. the control u, a nice closed form formula exists.
31
![Page 97: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/97.jpg)
Pontryagin’s Maximum Principle: The Discrete Case
• Derivation in the continuous and discrete case is also possible using
Lagrange multipliers
• Optimization using gradient descent is possible in the discrete case
32
![Page 98: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/98.jpg)
Optimal Estimation and the Kalman Filter
• Goal: From a sequence of noisy measurements, estimate the true
dynamics.
• Intimately tied to the problem of optimal control
dynamics: xk+1 = Axk + wk
observation: yk = Hxk + vk
where wk ∼ N (0,S) and vk ∼ N (0,P), x0 ∼ N (x0,Σ0), and
A,H,S ,P, x0,Σ0 are known.
⇒ Goal: estimate the probability distribution of xk given y0, . . . , yk−1:
pk = p(xk | y0, . . . , yk−1)
p0 = N (x0,Σ0)
33
![Page 99: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/99.jpg)
Optimal Estimation and the Kalman Filter
• Goal: From a sequence of noisy measurements, estimate the true
dynamics.
• Intimately tied to the problem of optimal control
dynamics: xk+1 = Axk + wk
observation: yk = Hxk + vk
where wk ∼ N (0,S) and vk ∼ N (0,P), x0 ∼ N (x0,Σ0), and
A,H,S ,P, x0,Σ0 are known.
⇒ Goal: estimate the probability distribution of xk given y0, . . . , yk−1:
pk = p(xk | y0, . . . , yk−1)
p0 = N (x0,Σ0)
33
![Page 100: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/100.jpg)
Optimal Estimation and the Kalman Filter
• Goal: From a sequence of noisy measurements, estimate the true
dynamics.
• Intimately tied to the problem of optimal control
dynamics: xk+1 = Axk + wk
observation: yk = Hxk + vk
where wk ∼ N (0,S) and vk ∼ N (0,P), x0 ∼ N (x0,Σ0), and
A,H,S ,P, x0,Σ0 are known.
⇒ Goal: estimate the probability distribution of xk given y0, . . . , yk−1:
pk = p(xk | y0, . . . , yk−1)
p0 = N (x0,Σ0)
33
![Page 101: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/101.jpg)
Optimal Estimation and the Kalman Filter
• Goal: From a sequence of noisy measurements, estimate the true
dynamics.
• Intimately tied to the problem of optimal control
dynamics: xk+1 = Axk + wk
observation: yk = Hxk + vk
where wk ∼ N (0,S) and vk ∼ N (0,P), x0 ∼ N (x0,Σ0), and
A,H,S ,P, x0,Σ0 are known.
⇒ Goal: estimate the probability distribution of xk given y0, . . . , yk−1:
pk = p(xk | y0, . . . , yk−1)
p0 = N (x0,Σ0)
33
![Page 102: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/102.jpg)
Optimal Estimation and the Kalman Filter
Using properties of multivariate Gaussian, it can be shown that
pk+1 = p(xk+1 | y0, . . . , yk) ∼ N (xk+1,Σk+1)
where
xk+1 = Axk + AΣkHT (P + HΣkH
T )−1(yk − Hxk) (11)
and
Σk+1 = S + AΣkAT − AΣkH
T (P + HΣkHT )−1HΣkA
T (12)
This is the Kalman filter.
Recall the Riccati equation for LQR
Vk = Q + ATVk+1A− ATVk+1B(R + BTVk+1B)−1BTVk+1A
34
![Page 103: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/103.jpg)
Optimal Estimation and the Kalman Filter
Using properties of multivariate Gaussian, it can be shown that
pk+1 = p(xk+1 | y0, . . . , yk) ∼ N (xk+1,Σk+1)
where
xk+1 = Axk + AΣkHT (P + HΣkH
T )−1(yk − Hxk) (11)
and
Σk+1 = S + AΣkAT − AΣkH
T (P + HΣkHT )−1HΣkA
T (12)
This is the Kalman filter.
Recall the Riccati equation for LQR
Vk = Q + ATVk+1A− ATVk+1B(R + BTVk+1B)−1BTVk+1A
34
![Page 104: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/104.jpg)
Optimal Estimation and the Kalman Filter
Using properties of multivariate Gaussian, it can be shown that
pk+1 = p(xk+1 | y0, . . . , yk) ∼ N (xk+1,Σk+1)
where
xk+1 = Axk + AΣkHT (P + HΣkH
T )−1(yk − Hxk) (11)
and
Σk+1 = S + AΣkAT − AΣkH
T (P + HΣkHT )−1HΣkA
T (12)
This is the Kalman filter.
Recall the Riccati equation for LQR
Vk = Q + ATVk+1A− ATVk+1B(R + BTVk+1B)−1BTVk+1A
34
![Page 105: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/105.jpg)
Conclusion
What we covered today
• Bellman equations and dynamic programming
• Hamilton-Jacobi-Bellman equations
• Linear-Quadratic-Gaussian, Linear-Quadratic regulator and the
Riccati equations
• Pontryagin’s maximum principle
• Kalman Filter
What we didn’t cover
• solving non-linear optimal problem using linear relaxation
• duality between optimal control and optimal estimation
35
![Page 106: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/106.jpg)
Conclusion
What we covered today
• Bellman equations and dynamic programming
• Hamilton-Jacobi-Bellman equations
• Linear-Quadratic-Gaussian, Linear-Quadratic regulator and the
Riccati equations
• Pontryagin’s maximum principle
• Kalman Filter
What we didn’t cover
• solving non-linear optimal problem using linear relaxation
• duality between optimal control and optimal estimation
35
![Page 107: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/107.jpg)
Conclusion
What we covered today
• Bellman equations and dynamic programming
• Hamilton-Jacobi-Bellman equations
• Linear-Quadratic-Gaussian, Linear-Quadratic regulator and the
Riccati equations
• Pontryagin’s maximum principle
• Kalman Filter
What we didn’t cover
• solving non-linear optimal problem using linear relaxation
• duality between optimal control and optimal estimation
35
![Page 108: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/108.jpg)
Conclusion
What we covered today
• Bellman equations and dynamic programming
• Hamilton-Jacobi-Bellman equations
• Linear-Quadratic-Gaussian, Linear-Quadratic regulator and the
Riccati equations
• Pontryagin’s maximum principle
• Kalman Filter
What we didn’t cover
• solving non-linear optimal problem using linear relaxation
• duality between optimal control and optimal estimation
35
![Page 109: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/109.jpg)
Conclusion
What we covered today
• Bellman equations and dynamic programming
• Hamilton-Jacobi-Bellman equations
• Linear-Quadratic-Gaussian, Linear-Quadratic regulator and the
Riccati equations
• Pontryagin’s maximum principle
• Kalman Filter
What we didn’t cover
• solving non-linear optimal problem using linear relaxation
• duality between optimal control and optimal estimation
35
![Page 110: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/110.jpg)
Conclusion
What we covered today
• Bellman equations and dynamic programming
• Hamilton-Jacobi-Bellman equations
• Linear-Quadratic-Gaussian, Linear-Quadratic regulator and the
Riccati equations
• Pontryagin’s maximum principle
• Kalman Filter
What we didn’t cover
• solving non-linear optimal problem using linear relaxation
• duality between optimal control and optimal estimation
35
![Page 111: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/111.jpg)
Any questions?
35
![Page 112: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/112.jpg)
Thank you!
35
![Page 113: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction](https://reader035.vdocuments.site/reader035/viewer/2022081621/6123654916ce3060c2611374/html5/thumbnails/113.jpg)
References
[1] Kenji Doya, Shin Ishii, Alexandre Pouget, and Rajesh PN Rao.
Bayesian brain: Probabilistic approaches to neural coding. MIT press,
2007.
[2] A. Heydari. Revisiting approximate dynamic programming and its
convergence. IEEE Transactions on Cybernetics, 44(12):2733–2743,
2014.
[3] Ali Heydari. Convergence analysis of policy iteration. CoRR,
abs/1505.05216, 2015. URL http://arxiv.org/abs/1505.05216.
36