4SC000 Q2 2017-2018
Optimal Control and Dynamic Programming
Duarte Antunes
Outline
• Shortest paths in graphs
• Dynamic programming
• Dijkstra’s and A* algorithms
• Certainty equivalent control
Graph
1
1
23
4
5
67
813
3
4 6
5
Weighted Graph
• Nodes
• Edges
• Weights
• Undirected if
V := {1, . . . , n}
(i, j) 2 E
wij = wji
1
23
4
5
67
813
3
4 6
5 3
Undirected Directed
wij � 0
E := {(i1, j1), . . . , (ir, jr)|i1, . . . ir, j1, . . . , jr 2 V}
wij = wji
3
E = {(3, 6), (2, 3), . . . }w36 = 7, w23 = 5, . . .
Applications
2
Graphs model networks (road, social, transportation, etc.) and can be found in numerous applications
Shortest path problem
3
1
23
4
5
67
813
3
4 6
5
Find a path from an initial node to a destination node in a weighted graph, with minimum length (sum of the weights of its edges)
Initial
Final
Minimum length 11
Can we use the DP algorithm to find the shortest path?
Discussion
4
• Computing an optimal path in a transition diagram can be seen as computing the shortest path from the nodes at stage to the node at stage of the following weighted graph:
c011
Stage 1Stage 0 Stage hStage h�1
c0n01
c0n02
c021
c022
c012
c111
c121
c122
c123
c1n11
ch�111
ch�121
ch�122
ch�1nh�11
chnh
ch1 artificial node
0 h+ 1
h+ 1artificial stage
0
• For graphs with this structure we already know how to use DP to compute shortest paths.
• Adjustments are needed for general graphs (e.g. cycles may occur) but DP can still be used to provide the shortest path, as we show next.
Dynamic programming formulation
5
4
Given a weighted graph construct a transition diagram:• stages, states at decision stages and only the destination at the terminal
stage.
• Make , if there is no link from to , and .
1
2
3
4
1
2
3
4
1
2
3
4
1
2
5
8
13
3
4
3
1
0
5
8
0
3
1
1
0
5
8
0
3
1
8
0
1
3
h = n� 1 n
wij = 1 i j
Destination
1
3
3
Initial
wii = 0ckij = wij
11
Stage k0 1 2 3
Stat
e x k
1
2
3
4
Dynamic programming solution
6
Apply the DP algorithm to this transition diagram
• Costs-to-go at a stage are the costs of the shortest path with hops. In particular costs-to-go at the initial stage are the optimal costs for each initial condition.
• To find an optimal path follow the policy for a given initial state.• Cost-to-go at stage of a given state is infinite if there is no path from that initial state to
the destination.
1
2
5
8
13
3
4
3
Destination
1
3
3
Initial
0 0 0 0
333
6 6
887
The implementation can be made more efficient and one does not need to first construct the transition diagram. Moreover, one can stop when the costs-to-go remain unchanged.
k n� 1� k
0
Stage k0 1 2 3 4 5
Stat
e x k
1
2
3
4
5
6
Example
7
1
23
4
5
67
813
3
4 6
5
Another example for an undirected graph
0 0 0 0 0
4
13 15
12
1
1
1
4 4 4 4
7 7 7 7
7 7 7 7 7
10 10 10
11 11
8
Shortest paths in road networks
What is the shortest distance from Bucharest to Lugoj?
Lugoj
Neamt
Iasi
Vaslui
Hirsova
Eforie
UrziceniBucharest
Giurgiu
Fagaras
Pitesti
Craiova
Sibiu
Rimnicu Vilcea
Oradea
Zerind
Arad
Timisoara
Mehadia
Dobreta
71
75
118
111
70
75120
146
97
138
80
99
211
10186
98
142
92
87
85
90
151
140
Rode map of Romania
9
Shortest paths in road networks
504 km (Route: Bucharest, Pitesti, Craiova, Dobreta, Mehadia, and Lugoj)
5 10 15 20 25 30 35 40 45
-5
0
5
10
15
20
25
30
10
Robot path planning
A
What is the shortest path for a robot to go from point A to B?
B
11
Assumptions
• It takes distance unit to move horizontally or vertically between adjacent nodes and units to move diagonally.
• Distances to obstacle nodes are infinite.
• Distance between two diagonally adjacent nodes, adjacent to the same obstacle node is infinite.
1
p2
p2
1 1
1
1
1
1
p2
p2
p2
1
1 1
11
1
1
1
1
1
12
Robot path planning
B
A
What is the shortest path for a robot to go from point A to B?
2 4 6 8 10 12 141
2
3
4
5
6
7
8
9
10
11
13
Robot path planning
2 4 6 8 10 12 141
2
3
4
5
6
7
8
9
10
11
7.83 6.83 5.83 4.83 3.83 3.41 3.00 3.41 3.83 4.83 5.83 6.83 7.83
7.41 6.41 5.41 4.41 3.41 2.41 2.00 2.41 3.41 4.41 5.41 6.41 7.41
7.00 6.00 5.00 4.00 3.00 2.00 1.00 1.41 5.41 8.41
7.41 6.41 6.00 0.00 1.00 6.41 9.41
7.83 7.41 7.00 1.00 1.41 5.41 8.41
8.83 8.41 8.00 2.00 2.41 3.41 4.41 5.41 6.41 7.41
9.83 9.41 9.00 3.00 6.41 6.83 7.83
10.00 9.00 8.00 7.00 6.00 5.00 4.00 5.00 6.00 7.41 7.83 8.24
10.41 9.41 9.00 6.00 6.41 7.41 8.41 8.83 9.24
10.83 10.41 10.00 11.00 10.00 9.00 8.00 7.00 7.41 7.83 8.83 9.83 10.24
Simpler example to show the costs-to-go
Side remark: the cost-to-go can be view as a Lyapunov function and the policy can be obtained by following the direction of maximum decrease of this function.
14
Time-varying graphs
How to design a shortest path from A to B when the obstacles are moving?
t = 0 t = 1
t = 2 t = T
Initial position
Final
15
Time-varying graphs1. Consider the set of static graphs for each time step
t = 0 t = 1
t = 2 t = T
16
Time-varying graphs
t = 0
t = 1
t = 2
2. Build a time-invariant graph in 3D
3. Compute shortest path for 3D graph Initial node: initial node at time Final node: final node at time t = T
t = 0
t = T
Example
p2p
21
p2
p2 1
p21 1
Outline
• Shortest paths in graphs
• Dynamic programming
• Dijkstra’s and A* algorithms
• Certainty equivalent control
17
Discussion
DP can be quite inefficient when computing an optimal path in enough.
• For shortest path problems in graphs, there are many alternative algorithms. We describe next the Dijkstra’s and the A* algorithms.
1 11 3
4 5
2
nn� 1
> 2 > 2 > 2> 2> 2> 2 > 2 > 2
initial destination
• Figure example: DP searches the full space - not necessary to compute the optimal path.
18
Dijkstra’s algorithm Main ideas
• Iteratively generate shorter paths from the origin to every node.
• Updates list of nodes (wavefront) which can be explored next.
• New nodes are added to the wavefront based on the cost: neighbors of node with the smallest distance to the origin.
source: wikipedia
19
Dijkstra’s algorithm
Initialization• for , , and OPEN initial node - final node
Steps 1. Remove a node from OPEN with the minimum estimate . If stop, otherwise
execute step 2 for every node for which there is a path (arrow) from to .
2. If : set , set , place in OPEN if it is not there already. Otherwise do not update , .
3. After executing Step 2 for all the nodes corresponding to out-neighbors of , go to step I.
di = 1 i 2 V � {p} p�= {p}dp = 0
i di i = tj i
j
j
j i
dj
di + wij < dj dj = di + wij �(j) = i�(j)
Optimal path• To keep track of the shortest paths if suffices to save for every node the next node
along the optimal path (discovered so far) leading to the initial node.
• The optimal path is then given by for , or equivalently , , where is such that .
• If OPEN is empty at a given step of the algorithm then there is no path to the destination.
i
(i0, i1, . . . , iL) iL = t iL�1 = �(t) . . . , i0 = �(i1)
` 2 {1, 2, . . . , L} L i0 = p
�(i)
i`�1 = �(i`)
t
20
Example I
1 11 3
4 5
2
nn� 1
> 2 > 2 > 2> 2> 2> 2 > 2 > 2
initial destination
Dijkstra’s algorithm requires only three iterations for this example
Iteration Pairs (i, di), i 2OPEN
1
2
0
+ other pairs pertaining to other neigh. of node
Destination/final node removed from OPEN - terminate
(1, 0)
(2, 1)
1
3
+ other pairs pertaining to other neigh. of nodes & 21
(3, 2)
�(2) = 1
�(3) = 2
21
Example II
1
23
4
5
67
813
3
4 6
5
Iteration Pairs (i, di), i 2OPEN
1
2
3
4
0
5
(1, 0)
(2, 1), (3, 8), (4, 6)
(3, 6), (4, 4)
(3, 6), (5, 7)
(5, 7), (6, 13)
(6, 11)
�(2) = 1 �(3) = 1 �(4) = 1
�(3) = 2 �(4) = 2
�(5) = 4
�(6) = 5
Optimal path (from end to start) (6,�(6),�(�(6)), . . . , 1) = (6, 5, 4, 2, 1)
�(6) = 3
22
Shortest paths in road networks
What is the shortest distance from Bucharest to Lugoj?
Lugoj
Neamt
Iasi
Vaslui
Hirsova
Eforie
UrziceniBucharest
Giurgiu
Fagaras
Pitesti
Craiova
Sibiu
Rimnicu Vilcea
Oradea
Zerind
Arad
Timisoara
Mehadia
Dobreta
71
75
118
111
70
75120
146
97
138
80
99
211
10186
98
142
92
87
85
90
151
140
Rode map of Romania
23
Example III
Shortest path from Bucharest to Lugoj
Iteration Pairs {i, di}, i 2 OPEN0 {Lugoj, 0}1 {Mehadia, 70}, {Timisoara, 111}2 {Timisoara, 111}, {Dobreta, 145}3 {Dobreta, 145}, {Arad, 229}4 {Arad, 229}, {Craiova, 265}5 {Craiova, 265}, {Sibiu, 369}, {Zerind, 304}6 {Sibiu, 369}, {Zerind, 304}, {Pitesti, 403}, {Rimnicu Vilcea, 411}7 {Sibiu, 369}, {Pitesti, 403}, {Rimnicu Vilcea, 411}, {Oradea, 375}8 {Pitesti, 403}, {Rimnicu Vilcea, 411}, {Oradea, 375}, {Fagaras, 468}9 {Pitesti, 403}, {Rimnicu Vilcea, 411}, {Fagaras, 468}10 {Rimnicu Vilcea, 411}, {Fagaras, 468}, {Bucharest, 504}11 {Fagaras, 468}, {Bucharest, 504}12 {Bucharest, 504}
From this data we can obtain and compute the optimal path.�
24
A*
• Similar to Dijkstra’s algorithm but an estimate (heuristic) of the distance to the destination for each node is also taken into account when picking the node to be explored next. New nodes are added to the wavefront based on .
• If the heuristic is: (i) smaller than the optimal cost from that node to the destination; (ii) is such that for every then optimal path is found. Otherwise no optimality guarantees.
• To run the A* algorithm under the two heuristic assumptions: 1. Change the weights to . 2. Run Dijkstra’s algorithm and get optimal path. 3. Obtain optimal cost in the original graph with weights .
• The general algorithm is given next, which works under or without these assumptions.
h(i)i 2 V
h(i) wij + h(j) i, j
di + h(i)
w̄ij = wij + h(j)� h(i)
wij
25
A*
Initialization• for , , and OPEN initial node - final node
Steps 1. Remove a node from OPEN with the minimum . If stop, otherwise
execute step 2 for every node for which there is a path (arrow) from to .
2. If : set , set , place in OPEN if it is not there already. Otherwise do not update , .
3. After executing Step 2 for all the nodes corresponding to out-neighbors of , go to step I.
di = 1 i 2 V � {p} p�= {p}dp = 0
i i = tj i
j
j
j i
dj
di + wij < dj dj = di + wij �(j) = i�(j)
t
di + h(i)
(same algorithm as the Dijkstra’s algorithm except for , same remarks to find optimal path as in slide 19)
di + h(i)
26
A* and Dijkstra’s algorithm
A* typically much faster if we have good a heuristic (might not be easy to find! especially if we require it to satisfy two conditions discussed before)
source: wikipedia
Dijkstra’s A*
27
Example
Lugoj
Neamt
Iasi
Vaslui
Hirsova
Eforie
UrziceBucharest
Giurgiu
Fagaras
Pitesti
Craiova
Sibi
Rimnicu
Oradea
Zerin
Arad
Timisoara
Mehadia
Dobreta
71
7
11
117
712
149
13
89
21
108
9
14
9
8
8
9
15
14
Neamt 234Lasi 226Vaslui 199Urziceni 80Hirsova 151Eforie 161
Bucharest 0Giurgi 77Pitesti 98Craiova 160Fagaras 178Sibiu 253
Rimnicu Vilcea 193Lugoj 244
Mehadia 241Dobreta 242Timisoara 329
Arad 366Zerind 374Oradea 380
Straight line distance to Bucharest h(i)
28
Example
Iteration Pairs {i, di} in OPEN0 {Lugoj,0}1 {Mehadia,67}, {Timisoara,196}2 {Timisoara,196}, {Dobreta,143}3 {Timisoara,196}, {Craiova,181}4 {Timisoara,196}, {Pitesti,257}, {Rim. Vilcea,360}5 {Pitesti,257}, {Rim. Vilcea,360}, {Arad,351}6 {Rim. Vilcea,360}, {Arad,351}, {Bucharest,260}
1. Change the weights to . 2. Run Dijkstra’s algorithm and get optimal path. 3. Obtain optimal cost in the original graph with weights .
w̄ij = wij + h(j)� h(i)
From this data we can obtain and compute the optimal path.�
wij
29
Discussion
• For large graphs one cannot even store the number of nodes and initialise but we can still run the algorithm if we keep track of a list of closed nodes (removed in step 1, see slide 19) so that they are not visited again (on slide 19 this is assured by )
• If optimality is not needed, there are many more graph search algorithms, e.g., breath-first search, depth-first search (see label correcting methods in Bertseka’s book, Ch.2)
• For robot motion planning Dijktra and A* are in general naive:
• construct nodes as we move along (so the graph is only implicit).
• random placement of nodes are in general better.
• A popular method that improve upon previous methods based on these two remarks is Rapidly-exploring random tree (RRT) (see LaValle’s book).
di
di + wij < dj
30
Discussion
• The Dijkstra’s algorithm and other search algorithms (e.g. A*) are typically computationally more efficient than DP to compute optimal paths.
• DP explores every node providing the optimal paths from every node to the destination. This is inefficient when interested in one optimal path.
• Why then dynamic programming? Provides a policy which allows to cope with disturbances - see lecture 2.
• We discuss next how to use the Dijkstra’s algorithm to provide the optimal policy in real time (online).
• In Appendix A, the Dijkstra’s algorithm is used to obtain the same optimal policy obtained in the first lecture with DP.
• Thus, again, why DP then? Stochastic DP! + other advantages.
31
Edsger W. Dijkstra
Historical note• Edsger W. Dijkstra was a professor at TU/Eindhoven from 1962 to 1984
What's the shortest way to travel from Rotterdam to Groningen? It is the algorithm for the shortest path, which I designed in about 20 minutes. One morning I was shopping in Amsterdam with my young fiancée, and tired, we sat down on the café terrace to drink a cup of coffee and I was just thinking about whether I could do this, and I then designed the algorithm for the shortest path. As I said, it was a 20-minute invention. In fact, it was published in 1959, three years later. The publication is still quite nice. One of the reasons that it is so nice was that I designed it without pencil and paper. Without pencil and paper you are almost forced to avoid all avoidable complexities. Eventually that algorithm became, to my great amazement, one of the cornerstones of my fame. Edsger W. Dijkstra (1930-2002)
Outline
• Shortest paths in graphs
• Dynamic programming
• Dijkstra’s and A* algorithms
• Certainty equivalent control
32
Shortest paths in graphs and policies• A transition diagram is just a weighted graph and therefore we can compute optimal paths with methods to compute shortest paths in graphs (e.g. Dijkstra’s, A*)
initial stage final stage
• Doing this for every stage and every state and taking the first decision of the optimal paths we obtain the optimal policy! (function that for each state give the first decision of the optimal path from each state to last stage)
example
(see also appendix A)
optimal policy
•However, this is typically computationally less efficient than DP
0 h
33
Certainty equivalent control
• Yet, we can implement the method just described (using e.g., the Dijkstra’s algorithm) online
1. Compute the optimal path for the initial sate and take the first decision
initial stage final stageh0
2. If no disturbance occurred use the next decision along the optimal path, otherwise recompute (online!) and apply first decision
disturbance(recompute)
another disturbances(recompute)
34
Discussion• Doing this we end up with the same policy as DP neglecting disturbances considered in the previous lecture.
• The policy obtained with DP is explicit whereas this new (equivalent) one is implicit and requires online computations!
• In the literature this (equivalent) policy is called certainty equivalent control and is very related to model predictive control (to be addressed later)
•To summarize:
Optimal paths
DP
Dijkstra’s (more efficient)
DP (might be computationally hard)
Dijkstra’s offline (less efficient)
Dijkstra’s online (requires online computations)
Certainty equivalent control Stochastic DP
Stochastic DP
Dijkstra
35
Concluding remarks
Summary• DP can be used to solve shortest paths in graphs.
• Discussed alternative methods, Dijkstra’s and A*.
• Introduced certainty equivalent control.
• Main message there are other methods to compute optimal paths and optimal policies (except stochastic DP!) - (dis)advantages depend on the application (e.g. can we use online computations?).
After this lecture, you should be able to:• Compute the shortest path in a graph with DP, Dijkstra and A*.
Appendix ASolving a DP problem with Dijkstra’s algorithm
012 3 5
1
23
45
2
3
4
0
1
24
11
1
43
1
Initial transition diagram
5
Example
1. Add artificial terminal node with a cost to arrive to it at the final stage coinciding with the terminal cost
Consider the same initial transition diagram considered in the first lecture and follow steps I-3
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
3. Compute the optimal paths (using Dijkstra’s algorithm) for each state to the artificial terminal node and keep track for each initial state of the first decision of the optimal path (this is the optimal policy)
0
4
Example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Iteration Pairs (i, di), i 2OPEN
1
2
3
4
0
5
(1, 0)
Initial state 1
(3, 2), (4, 1)
(3, 2), (7, 1), (8, 2)
(3, 2), (10, 2), (11, 4), (8, 2)
(6, 3), (10, 2), (11, 4), (8, 2)
(6, 3), (10, 2), (11, 4), (12, 7)
(6, 3), (14, 4), (13, 7), (11, 4), (12, 7)
(15, 4), (13, 7), (11, 4), (12, 7)
(14, 4), (13, 7), (11, 4), (12, 7)
6
7
8
It is clear that if we consider the states 4, 7, 10 as initial states the transitions(arrows) along this path are also optimal first decisions for these initial states (belong to the optimal policy)
Belongs to the optimal policy
(15, 8), (13, 7), (11, 4), (12, 7)
(15, 8), (13, 6), (12, 7)
(15, 6), (12, 7)
9
10
11
Example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Iteration Pairs (i, di), i 2OPEN
1
2
3
4
0
5
Initial state
6
7
It is clear that if we consider the states 5, 9, 12, 14 as initial states the transitions(arrows) along this path are also optimal first decisions for these initial states (belong to the optimal policy)
Belongs to the optimal policy
(2, 0)
(4, 4), (5, 1)
(4, 4), (9, 4), (8, 5)
(7, 4), (9, 4), (8, 5)
(10, 5), (11, 7) (9, 4), (8, 5)
(10, 5), (11, 7) (12, 5), (8, 5)
(10, 5), (11, 7) (14, 5), (8, 5), (13, 10)
2
(10, 5), (11, 7), (15, 9), (8, 5), (13, 10)
(10, 5), (11, 7), (15, 9), (13, 10)
(11, 7), (15, 9), (13, 10)
(15, 9), (13, 10)
8
9
10
Example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Iteration Pairs (i, di), i 2OPEN
1
2
3
4
0
5
Initial state
6
7
Belongs to the optimal policy
3
It is clear that if we consider the states 8, 11, 13 as initial states the transitions(arrows) along this path are also optimal first decisions for these initial states (belong to the optimal policy)
(3, 0)
(6, 1), (7, 3), (8, 1)
(10, 5), (11, 4) (7, 3), (8, 1)
(10, 5), (11, 3) (7, 3), (12, 6)
(10, 4), (11, 3) (12, 6)
(10, 4), (13, 7), (14, 6) (12, 6)
(13, 7), (14, 6) (12, 6)
(13, 7), (15, 10), (12, 6)
(15, 7), (12, 6)8
(15, 7), (14, 6)
(15, 7)9
10
Example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Iteration Pairs (i, di), i 2OPEN
1
2
3
4
0
5
Initial state
Belongs to the optimal policy
It is clear that if we consider the states 8, 11, 13 as initial states the transitions(arrows) along this path are also optimal first decisions for these initial states (belong to the optimal policy)
6
(6, 0)
(10, 4), (11, 3)
(10, 4), (13, 7), (14, 5)
(13, 7), (14, 5)
(13, 7), (15, 9)
(13, 7)
Optimal policy
Example
Combining the first decisions leading to the end stage for each node we obtain the optimal policy (the same obtained with the DP algorithm in the first lecture)
Appendix BDP with terminal constraints
B1
Suppose that we want to reach a given state at the final stage of a transition diagram starting at a given initial state with minimum cost (as opposed to simply reaching the final stage)
1 1
2
3
5
1
2
22
2
50
1
24
2
DP with terminal constraints
terminalstate
Initial state
Since a transition diagram is simply a weighted graph, we can apply graph search methods, and in particular repeat the trick just used to apply DP.
1
1
B2
DP with terminal constraints
1.Relabel nodes
1
2
3
4
5
6
7
62.transform graph to trans. diagram
weighted graph with final and terminal nodes
1
2
3
4
5
6
7
1
2
3
4
5
6
71
1
1
1
4
1
0
0
0
0
0
0
0
25
2
2
3. Apply DP
1
0
1
1
4
1
111
000
11
44
11
2
0
0
0
0
0
0
0
2
2
2
2
5
0
3
33
3
B3
DP with terminal constraints
2
22
2
50
4
11
2
3
4
5
6
3
2
4
1
1
By inspection we can see that this is the only only part that matters
Conclusion: if there is a terminal constraint:• Remove the arrows from nodes at the final decision states that do not lead to the
desired terminal state.• For each state choose the arrow with minimum cost and set the cost-to-go of that node
to be the terminal cost of the desired terminal node plus the cost of such arrow. If the state has no arrows, set the cost-to-go to infinity.
• Apply DP.