Transcript
![Page 1: Lab 6-2: Q Network for Cart Pole - GitHub Pageshunkim.github.io/ml/RL/rl06-l2.pdf · 2017-10-02 · import numpy as np import tensor flow as tf import gym gym. make( CartPole—vØ](https://reader036.vdocuments.site/reader036/viewer/2022062506/5f0322137e708231d407b3a0/html5/thumbnails/1.jpg)
Lab 6-2: Q Network for Cart Pole
Reinforcement Learning with TensorFlow&OpenAI GymSung Kim <[email protected]>
![Page 2: Lab 6-2: Q Network for Cart Pole - GitHub Pageshunkim.github.io/ml/RL/rl06-l2.pdf · 2017-10-02 · import numpy as np import tensor flow as tf import gym gym. make( CartPole—vØ](https://reader036.vdocuments.site/reader036/viewer/2022062506/5f0322137e708231d407b3a0/html5/thumbnails/2.jpg)
Cart Pole
https://gym.openai.com/docs
![Page 3: Lab 6-2: Q Network for Cart Pole - GitHub Pageshunkim.github.io/ml/RL/rl06-l2.pdf · 2017-10-02 · import numpy as np import tensor flow as tf import gym gym. make( CartPole—vØ](https://reader036.vdocuments.site/reader036/viewer/2022062506/5f0322137e708231d407b3a0/html5/thumbnails/3.jpg)
Random trials
![Page 4: Lab 6-2: Q Network for Cart Pole - GitHub Pageshunkim.github.io/ml/RL/rl06-l2.pdf · 2017-10-02 · import numpy as np import tensor flow as tf import gym gym. make( CartPole—vØ](https://reader036.vdocuments.site/reader036/viewer/2022062506/5f0322137e708231d407b3a0/html5/thumbnails/4.jpg)
Rewards
![Page 5: Lab 6-2: Q Network for Cart Pole - GitHub Pageshunkim.github.io/ml/RL/rl06-l2.pdf · 2017-10-02 · import numpy as np import tensor flow as tf import gym gym. make( CartPole—vØ](https://reader036.vdocuments.site/reader036/viewer/2022062506/5f0322137e708231d407b3a0/html5/thumbnails/5.jpg)
Cart Pole Q-network
(2)Ws(1)s
![Page 6: Lab 6-2: Q Network for Cart Pole - GitHub Pageshunkim.github.io/ml/RL/rl06-l2.pdf · 2017-10-02 · import numpy as np import tensor flow as tf import gym gym. make( CartPole—vØ](https://reader036.vdocuments.site/reader036/viewer/2022062506/5f0322137e708231d407b3a0/html5/thumbnails/6.jpg)
Q-Network training (Network construction)
(2)Ws(1)s
![Page 7: Lab 6-2: Q Network for Cart Pole - GitHub Pageshunkim.github.io/ml/RL/rl06-l2.pdf · 2017-10-02 · import numpy as np import tensor flow as tf import gym gym. make( CartPole—vØ](https://reader036.vdocuments.site/reader036/viewer/2022062506/5f0322137e708231d407b3a0/html5/thumbnails/7.jpg)
Q-Network training (linear regression)
(2)Ws(1)s
y = r + �maxQ(s0)
cost(W ) = (Ws� y)2
![Page 8: Lab 6-2: Q Network for Cart Pole - GitHub Pageshunkim.github.io/ml/RL/rl06-l2.pdf · 2017-10-02 · import numpy as np import tensor flow as tf import gym gym. make( CartPole—vØ](https://reader036.vdocuments.site/reader036/viewer/2022062506/5f0322137e708231d407b3a0/html5/thumbnails/8.jpg)
Code: Network and setup
![Page 9: Lab 6-2: Q Network for Cart Pole - GitHub Pageshunkim.github.io/ml/RL/rl06-l2.pdf · 2017-10-02 · import numpy as np import tensor flow as tf import gym gym. make( CartPole—vØ](https://reader036.vdocuments.site/reader036/viewer/2022062506/5f0322137e708231d407b3a0/html5/thumbnails/9.jpg)
Code: Training
![Page 10: Lab 6-2: Q Network for Cart Pole - GitHub Pageshunkim.github.io/ml/RL/rl06-l2.pdf · 2017-10-02 · import numpy as np import tensor flow as tf import gym gym. make( CartPole—vØ](https://reader036.vdocuments.site/reader036/viewer/2022062506/5f0322137e708231d407b3a0/html5/thumbnails/10.jpg)
Code: apply
![Page 11: Lab 6-2: Q Network for Cart Pole - GitHub Pageshunkim.github.io/ml/RL/rl06-l2.pdf · 2017-10-02 · import numpy as np import tensor flow as tf import gym gym. make( CartPole—vØ](https://reader036.vdocuments.site/reader036/viewer/2022062506/5f0322137e708231d407b3a0/html5/thumbnails/11.jpg)
Results: really poor!
![Page 12: Lab 6-2: Q Network for Cart Pole - GitHub Pageshunkim.github.io/ml/RL/rl06-l2.pdf · 2017-10-02 · import numpy as np import tensor flow as tf import gym gym. make( CartPole—vØ](https://reader036.vdocuments.site/reader036/viewer/2022062506/5f0322137e708231d407b3a0/html5/thumbnails/12.jpg)
Why does not work? Too shallow?
![Page 13: Lab 6-2: Q Network for Cart Pole - GitHub Pageshunkim.github.io/ml/RL/rl06-l2.pdf · 2017-10-02 · import numpy as np import tensor flow as tf import gym gym. make( CartPole—vØ](https://reader036.vdocuments.site/reader036/viewer/2022062506/5f0322137e708231d407b3a0/html5/thumbnails/13.jpg)
Excise
• Why does not work?
• Hint: DQN