reinforcement learning in continuous state and action...
TRANSCRIPT
![Page 1: Reinforcement Learning in Continuous State and Action Spacesbusoniu.net/files/repository/defense-beamer.pdf · Reinforcement learning Challenge & contribution Examples Conclusions](https://reader033.vdocuments.site/reader033/viewer/2022042408/5f237feff589be0c144864be/html5/thumbnails/1.jpg)
Reinforcement learning Challenge & contribution Examples Conclusions
Reinforcement Learning inContinuous State and Action Spaces
Lucian BusoniuAdvisers: Robert Babuška, Bart De Schutter
Delft University of TechnologyCenter for Systems and Control
Project: Interactive Collaborative Information Systems
13 January 2009
![Page 2: Reinforcement Learning in Continuous State and Action Spacesbusoniu.net/files/repository/defense-beamer.pdf · Reinforcement learning Challenge & contribution Examples Conclusions](https://reader033.vdocuments.site/reader033/viewer/2022042408/5f237feff589be0c144864be/html5/thumbnails/2.jpg)
Reinforcement learning Challenge & contribution Examples Conclusions
Motivation and focus
Learning can find solutions that:are hard to determine a prioriimprove over time
Reinforcement learning:uses reward signal as performance feedbackcan work without prior knowledge
Exact RL solutions only in discrete cases:this thesis: continuous spacesusing approximate solutions
![Page 3: Reinforcement Learning in Continuous State and Action Spacesbusoniu.net/files/repository/defense-beamer.pdf · Reinforcement learning Challenge & contribution Examples Conclusions](https://reader033.vdocuments.site/reader033/viewer/2022042408/5f237feff589be0c144864be/html5/thumbnails/3.jpg)
Reinforcement learning Challenge & contribution Examples Conclusions
Motivation and focus
Learning can find solutions that:are hard to determine a prioriimprove over time
Reinforcement learning:uses reward signal as performance feedbackcan work without prior knowledge
Exact RL solutions only in discrete cases:this thesis: continuous spacesusing approximate solutions
![Page 4: Reinforcement Learning in Continuous State and Action Spacesbusoniu.net/files/repository/defense-beamer.pdf · Reinforcement learning Challenge & contribution Examples Conclusions](https://reader033.vdocuments.site/reader033/viewer/2022042408/5f237feff589be0c144864be/html5/thumbnails/4.jpg)
Reinforcement learning Challenge & contribution Examples Conclusions
Motivation and focus
Learning can find solutions that:are hard to determine a prioriimprove over time
Reinforcement learning:uses reward signal as performance feedbackcan work without prior knowledge
Exact RL solutions only in discrete cases:this thesis: continuous spacesusing approximate solutions
![Page 5: Reinforcement Learning in Continuous State and Action Spacesbusoniu.net/files/repository/defense-beamer.pdf · Reinforcement learning Challenge & contribution Examples Conclusions](https://reader033.vdocuments.site/reader033/viewer/2022042408/5f237feff589be0c144864be/html5/thumbnails/5.jpg)
Reinforcement learning Challenge & contribution Examples Conclusions
Outline
1 Reinforcement learning
2 Challenge & contribution
3 Examples
4 Conclusions
![Page 6: Reinforcement Learning in Continuous State and Action Spacesbusoniu.net/files/repository/defense-beamer.pdf · Reinforcement learning Challenge & contribution Examples Conclusions](https://reader033.vdocuments.site/reader033/viewer/2022042408/5f237feff589be0c144864be/html5/thumbnails/6.jpg)
Reinforcement learning Challenge & contribution Examples Conclusions
RL problem
Optimal control problemExample: robot should move to goal in shortest time
![Page 7: Reinforcement Learning in Continuous State and Action Spacesbusoniu.net/files/repository/defense-beamer.pdf · Reinforcement learning Challenge & contribution Examples Conclusions](https://reader033.vdocuments.site/reader033/viewer/2022042408/5f237feff589be0c144864be/html5/thumbnails/7.jpg)
Reinforcement learning Challenge & contribution Examples Conclusions
Elements of RL
Robot in given state (position X , Y )Robot takes action (e.g., move forward)
and reaches new stateReceives reward = quality of state transition
![Page 8: Reinforcement Learning in Continuous State and Action Spacesbusoniu.net/files/repository/defense-beamer.pdf · Reinforcement learning Challenge & contribution Examples Conclusions](https://reader033.vdocuments.site/reader033/viewer/2022042408/5f237feff589be0c144864be/html5/thumbnails/8.jpg)
Reinforcement learning Challenge & contribution Examples Conclusions
Elements of RL
Robot in given state (position X , Y )Robot takes action (e.g., move forward)
and reaches new stateReceives reward = quality of state transition
![Page 9: Reinforcement Learning in Continuous State and Action Spacesbusoniu.net/files/repository/defense-beamer.pdf · Reinforcement learning Challenge & contribution Examples Conclusions](https://reader033.vdocuments.site/reader033/viewer/2022042408/5f237feff589be0c144864be/html5/thumbnails/9.jpg)
Reinforcement learning Challenge & contribution Examples Conclusions
Elements of RL
Robot in given state (position X , Y )Robot takes action (e.g., move forward)
and reaches new stateReceives reward = quality of state transition
![Page 10: Reinforcement Learning in Continuous State and Action Spacesbusoniu.net/files/repository/defense-beamer.pdf · Reinforcement learning Challenge & contribution Examples Conclusions](https://reader033.vdocuments.site/reader033/viewer/2022042408/5f237feff589be0c144864be/html5/thumbnails/10.jpg)
Reinforcement learning Challenge & contribution Examples Conclusions
Elements of RL
Robot in given state (position X , Y )Robot takes action (e.g., move forward)
and reaches new stateReceives reward = quality of state transition
![Page 11: Reinforcement Learning in Continuous State and Action Spacesbusoniu.net/files/repository/defense-beamer.pdf · Reinforcement learning Challenge & contribution Examples Conclusions](https://reader033.vdocuments.site/reader033/viewer/2022042408/5f237feff589be0c144864be/html5/thumbnails/11.jpg)
Reinforcement learning Challenge & contribution Examples Conclusions
Policy
Control policy: what action to take in every state
u = h(x)
E.g., in state [9, 1], move forwardin state [2, 10], move right
![Page 12: Reinforcement Learning in Continuous State and Action Spacesbusoniu.net/files/repository/defense-beamer.pdf · Reinforcement learning Challenge & contribution Examples Conclusions](https://reader033.vdocuments.site/reader033/viewer/2022042408/5f237feff589be0c144864be/html5/thumbnails/12.jpg)
Reinforcement learning Challenge & contribution Examples Conclusions
Policy
Control policy: what action to take in every state
u = h(x)
E.g., in state [9, 1], move forwardin state [2, 10], move right
![Page 13: Reinforcement Learning in Continuous State and Action Spacesbusoniu.net/files/repository/defense-beamer.pdf · Reinforcement learning Challenge & contribution Examples Conclusions](https://reader033.vdocuments.site/reader033/viewer/2022042408/5f237feff589be0c144864be/html5/thumbnails/13.jpg)
Reinforcement learning Challenge & contribution Examples Conclusions
Performance criterion
Reward = one-step performanceReturn = long-term performance, along trajectory
R(x , u) = r1 + γr2 + γ2r3 + γ3r4 + . . .
0 < γ < 1 discount factor
![Page 14: Reinforcement Learning in Continuous State and Action Spacesbusoniu.net/files/repository/defense-beamer.pdf · Reinforcement learning Challenge & contribution Examples Conclusions](https://reader033.vdocuments.site/reader033/viewer/2022042408/5f237feff589be0c144864be/html5/thumbnails/14.jpg)
Reinforcement learning Challenge & contribution Examples Conclusions
Performance criterion
Reward = one-step performanceReturn = long-term performance, along trajectory
R(x , u) = r1 + γr2 + γ2r3 + γ3r4 + . . .
0 < γ < 1 discount factor
![Page 15: Reinforcement Learning in Continuous State and Action Spacesbusoniu.net/files/repository/defense-beamer.pdf · Reinforcement learning Challenge & contribution Examples Conclusions](https://reader033.vdocuments.site/reader033/viewer/2022042408/5f237feff589be0c144864be/html5/thumbnails/15.jpg)
Reinforcement learning Challenge & contribution Examples Conclusions
Optimality
Goal: obtain maximal return:R∗(x , u) = r1 + discounted rewards
along best trajectory starting in x1
Optimal policy h∗ can be computed from R∗
![Page 16: Reinforcement Learning in Continuous State and Action Spacesbusoniu.net/files/repository/defense-beamer.pdf · Reinforcement learning Challenge & contribution Examples Conclusions](https://reader033.vdocuments.site/reader033/viewer/2022042408/5f237feff589be0c144864be/html5/thumbnails/16.jpg)
Reinforcement learning Challenge & contribution Examples Conclusions
Optimality
Goal: obtain maximal return:R∗(x , u) = r1 + discounted rewards
along best trajectory starting in x1
Optimal policy h∗ can be computed from R∗
![Page 17: Reinforcement Learning in Continuous State and Action Spacesbusoniu.net/files/repository/defense-beamer.pdf · Reinforcement learning Challenge & contribution Examples Conclusions](https://reader033.vdocuments.site/reader033/viewer/2022042408/5f237feff589be0c144864be/html5/thumbnails/17.jpg)
Reinforcement learning Challenge & contribution Examples Conclusions
Algorithms
Many algorithms available to find optimal R∗, h∗
Some require prior knowledge about problem, or data:
Others work with no prior knowledge,and collect data by online interaction:
![Page 18: Reinforcement Learning in Continuous State and Action Spacesbusoniu.net/files/repository/defense-beamer.pdf · Reinforcement learning Challenge & contribution Examples Conclusions](https://reader033.vdocuments.site/reader033/viewer/2022042408/5f237feff589be0c144864be/html5/thumbnails/18.jpg)
Reinforcement learning Challenge & contribution Examples Conclusions
Algorithms
Many algorithms available to find optimal R∗, h∗
Some require prior knowledge about problem, or data:
Others work with no prior knowledge,and collect data by online interaction:
![Page 19: Reinforcement Learning in Continuous State and Action Spacesbusoniu.net/files/repository/defense-beamer.pdf · Reinforcement learning Challenge & contribution Examples Conclusions](https://reader033.vdocuments.site/reader033/viewer/2022042408/5f237feff589be0c144864be/html5/thumbnails/19.jpg)
Reinforcement learning Challenge & contribution Examples Conclusions
Algorithms
Many algorithms available to find optimal R∗, h∗
Some require prior knowledge about problem, or data:
Others work with no prior knowledge,and collect data by online interaction:
![Page 20: Reinforcement Learning in Continuous State and Action Spacesbusoniu.net/files/repository/defense-beamer.pdf · Reinforcement learning Challenge & contribution Examples Conclusions](https://reader033.vdocuments.site/reader033/viewer/2022042408/5f237feff589be0c144864be/html5/thumbnails/20.jpg)
Reinforcement learning Challenge & contribution Examples Conclusions
1 Reinforcement learning
2 Challenge & contribution
3 Examples
4 Conclusions
![Page 21: Reinforcement Learning in Continuous State and Action Spacesbusoniu.net/files/repository/defense-beamer.pdf · Reinforcement learning Challenge & contribution Examples Conclusions](https://reader033.vdocuments.site/reader033/viewer/2022042408/5f237feff589be0c144864be/html5/thumbnails/21.jpg)
Reinforcement learning Challenge & contribution Examples Conclusions
Challenge
Classical algorithms have to store R(x , u)for every combination of state x and action u
Only possible for when number of combinations is small
However, x and u often continuous⇒ infinitely many combinations!
E.g., for robot, x = [X , Y ] continuous
![Page 22: Reinforcement Learning in Continuous State and Action Spacesbusoniu.net/files/repository/defense-beamer.pdf · Reinforcement learning Challenge & contribution Examples Conclusions](https://reader033.vdocuments.site/reader033/viewer/2022042408/5f237feff589be0c144864be/html5/thumbnails/22.jpg)
Reinforcement learning Challenge & contribution Examples Conclusions
Challenge
Classical algorithms have to store R(x , u)for every combination of state x and action u
Only possible for when number of combinations is small
However, x and u often continuous⇒ infinitely many combinations!
E.g., for robot, x = [X , Y ] continuous
![Page 23: Reinforcement Learning in Continuous State and Action Spacesbusoniu.net/files/repository/defense-beamer.pdf · Reinforcement learning Challenge & contribution Examples Conclusions](https://reader033.vdocuments.site/reader033/viewer/2022042408/5f237feff589be0c144864be/html5/thumbnails/23.jpg)
Reinforcement learning Challenge & contribution Examples Conclusions
Challenge
Classical algorithms have to store R(x , u)for every combination of state x and action u
Only possible for when number of combinations is small
However, x and u often continuous⇒ infinitely many combinations!
E.g., for robot, x = [X , Y ] continuous
![Page 24: Reinforcement Learning in Continuous State and Action Spacesbusoniu.net/files/repository/defense-beamer.pdf · Reinforcement learning Challenge & contribution Examples Conclusions](https://reader033.vdocuments.site/reader033/viewer/2022042408/5f237feff589be0c144864be/html5/thumbnails/24.jpg)
Reinforcement learning Challenge & contribution Examples Conclusions
Challenge
Classical algorithms have to store R(x , u)for every combination of state x and action u
Only possible for when number of combinations is small
However, x and u often continuous⇒ infinitely many combinations!
E.g., for robot, x = [X , Y ] continuous
![Page 25: Reinforcement Learning in Continuous State and Action Spacesbusoniu.net/files/repository/defense-beamer.pdf · Reinforcement learning Challenge & contribution Examples Conclusions](https://reader033.vdocuments.site/reader033/viewer/2022042408/5f237feff589be0c144864be/html5/thumbnails/25.jpg)
Reinforcement learning Challenge & contribution Examples Conclusions
Contribution
Algorithms for reinforcement learningin problems with continuous states and actions
Using approximate representations of returns
![Page 26: Reinforcement Learning in Continuous State and Action Spacesbusoniu.net/files/repository/defense-beamer.pdf · Reinforcement learning Challenge & contribution Examples Conclusions](https://reader033.vdocuments.site/reader033/viewer/2022042408/5f237feff589be0c144864be/html5/thumbnails/26.jpg)
Reinforcement learning Challenge & contribution Examples Conclusions
Example: RL for inverted pendulum
Goal: point up and stay thereDifficulty: insufficient power, need to swing back & forthReward: the closer to vertical, the larger
Replay learned policy
![Page 27: Reinforcement Learning in Continuous State and Action Spacesbusoniu.net/files/repository/defense-beamer.pdf · Reinforcement learning Challenge & contribution Examples Conclusions](https://reader033.vdocuments.site/reader033/viewer/2022042408/5f237feff589be0c144864be/html5/thumbnails/27.jpg)
Reinforcement learning Challenge & contribution Examples Conclusions
Example: RL for robot goalkeeper
Catch ball using only video camera image
![Page 28: Reinforcement Learning in Continuous State and Action Spacesbusoniu.net/files/repository/defense-beamer.pdf · Reinforcement learning Challenge & contribution Examples Conclusions](https://reader033.vdocuments.site/reader033/viewer/2022042408/5f237feff589be0c144864be/html5/thumbnails/28.jpg)
Reinforcement learning Challenge & contribution Examples Conclusions
1 Reinforcement learning
2 Challenge & contribution
3 Examples
4 Conclusions
![Page 29: Reinforcement Learning in Continuous State and Action Spacesbusoniu.net/files/repository/defense-beamer.pdf · Reinforcement learning Challenge & contribution Examples Conclusions](https://reader033.vdocuments.site/reader033/viewer/2022042408/5f237feff589be0c144864be/html5/thumbnails/29.jpg)
Reinforcement learning Challenge & contribution Examples Conclusions
Summary
Reinforcement learning: very general framework
Can learn from interaction, without prior knowledge
However:Classical RL algorithms do not workwhen states, actions continuous
Need to use approximate representations (this thesis)
![Page 30: Reinforcement Learning in Continuous State and Action Spacesbusoniu.net/files/repository/defense-beamer.pdf · Reinforcement learning Challenge & contribution Examples Conclusions](https://reader033.vdocuments.site/reader033/viewer/2022042408/5f237feff589be0c144864be/html5/thumbnails/30.jpg)
Reinforcement learning Challenge & contribution Examples Conclusions
Summary
Reinforcement learning: very general framework
Can learn from interaction, without prior knowledge
However:Classical RL algorithms do not workwhen states, actions continuous
Need to use approximate representations (this thesis)
![Page 31: Reinforcement Learning in Continuous State and Action Spacesbusoniu.net/files/repository/defense-beamer.pdf · Reinforcement learning Challenge & contribution Examples Conclusions](https://reader033.vdocuments.site/reader033/viewer/2022042408/5f237feff589be0c144864be/html5/thumbnails/31.jpg)
Reinforcement learning Challenge & contribution Examples Conclusions
Application areas
1 Control engineering: optimal & learning control(e.g., robot control)
2 Computer science: intelligent agents
3 Economics
4 etc.
![Page 32: Reinforcement Learning in Continuous State and Action Spacesbusoniu.net/files/repository/defense-beamer.pdf · Reinforcement learning Challenge & contribution Examples Conclusions](https://reader033.vdocuments.site/reader033/viewer/2022042408/5f237feff589be0c144864be/html5/thumbnails/32.jpg)
Reinforcement learning Challenge & contribution Examples Conclusions
Thank you
Thank you!Questions?