lecture 4: q-learning (table) - github pageshunkim.github.io/ml/rl/rl04.pdf · 2017-10-02 ·...

Lecture 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim <[email protected]>

Upload: others

Post on 31-May-2020

2 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

Page 1: Lecture 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl04.pdf · 2017-10-02 · Lecture 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement

Lecture 4: Q-learning (table)exploit&exploration and discounted future reward

Reinforcement Learning with TensorFlow&OpenAI GymSung Kim <[email protected]>

Page 2: Lecture 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl04.pdf · 2017-10-02 · Lecture 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement

Dummy Q-learning algorithm

Page 3: Lecture 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl04.pdf · 2017-10-02 · Lecture 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement

Learning Q (s, a)?

Page 4: Lecture 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl04.pdf · 2017-10-02 · Lecture 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement

Learning Q(s, a) Table: one success!

111

Page 5: Lecture 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl04.pdf · 2017-10-02 · Lecture 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement

Exploit VS Exploration

111

http://home.deib.polimi.it/restelli/MyWebSite/pdf/rl5.pdf

Page 6: Lecture 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl04.pdf · 2017-10-02 · Lecture 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement

Exploit VS Exploration

0.0 0.0 0.0 0.0 0.0

Page 7: Lecture 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl04.pdf · 2017-10-02 · Lecture 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement

Exploit (weekday) VS Exploration (weekend)

0.5 0.6 0.3 0.2 0.5

Page 8: Lecture 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl04.pdf · 2017-10-02 · Lecture 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement

Exploit VS Exploration: E-greedy

e = 0.1

if rand < e: a = random

else: a = argmax(Q(s, a))

Page 9: Lecture 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl04.pdf · 2017-10-02 · Lecture 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement

Exploit VS Exploration: decaying E-greedy

for i in range (1000)

e = 0.1 / (i+1)

if random(1) < e: a = random

else: a = argmax(Q(s, a))

Page 10: Lecture 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl04.pdf · 2017-10-02 · Lecture 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement

Exploit VS Exploration: add random noise

0.5 0.6 0.3 0.2 0.5

Page 11: Lecture 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl04.pdf · 2017-10-02 · Lecture 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement

Exploit VS Exploration: add random noise

0.5 0.6 0.3 0.2 0.5

a = argmax(Q(s, a) + random_values)

a = argmax([0.5 0.6 0.3 0.2 0.5]+[0.1 0.2 0.7 0.3 0.1])

Page 12: Lecture 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl04.pdf · 2017-10-02 · Lecture 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement

Exploit VS Exploration: add random noise

0.5 0.6 0.3 0.2 0.5

for i in range (1000) a = argmax(Q(s, a) + random_values / (i+1))

Page 13: Lecture 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl04.pdf · 2017-10-02 · Lecture 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement

Exploit VS Exploration

111

Page 14: Lecture 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl04.pdf · 2017-10-02 · Lecture 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement

Dummy Q-learning algorithm

Page 15: Lecture 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl04.pdf · 2017-10-02 · Lecture 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement

Discounted future reward

111

Page 16: Lecture 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl04.pdf · 2017-10-02 · Lecture 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement

Learning Q (s, a) with discounted reward

Page 17: Lecture 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl04.pdf · 2017-10-02 · Lecture 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement

Discounted future reward

Page 18: Lecture 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl04.pdf · 2017-10-02 · Lecture 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement

Learning Q (s, a) with discounted reward

Page 19: Lecture 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl04.pdf · 2017-10-02 · Lecture 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement

Discounted reward ( = 0.9)

Page 20: Lecture 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl04.pdf · 2017-10-02 · Lecture 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement

Q-learning algorithm

Page 21: Lecture 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl04.pdf · 2017-10-02 · Lecture 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement

Q-Table Policy

(3) quality (reward) for the given action

(eg, LEFT: 0.5, RIGHT 0.1 UP: 0.0, DOWN: 0.8)

Q (s, a)

111 11111

(1) state, s

(2) action, a

Page 22: Lecture 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl04.pdf · 2017-10-02 · Lecture 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement

Convergence

• In deterministic worlds

• In finite states

Machine Learning, Tom Mitchell, 1997

Page 23: Lecture 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl04.pdf · 2017-10-02 · Lecture 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement

Lab: Q-learning Table

2018年度第3四半期決算説明資料¥績四半期推移 0% 10% 20% 30% 40% 50% 0 100 200 300 400 500 Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q 売上高

ML with Tensorflow-new - GitHub Pageshunkim.github.io/ml/lec11.pdf · 2017-10-02 · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 -68 27 Jan 2016 Case Study: AlexNet [Krizhevsky

ML with Tensorflow - GitHub Pageshunkim.github.io/ml/lec6.pdfML with Tensorflow Author: Sung Kim Created Date: 3/29/2016 7:49:09 AM

· 2020. 8. 7. · q q q ˆˆ Ν− Νϑ q q q ˆ˘ Ν− Νϑ q q q q q q N- NJ q q q q q q N- NJ q q q q q q N- NJ q q q q q q N- NJ

Pictures from the Past: Mount St. Joseph, Cincinnati, … from the Past: Mount St. Joseph, Cincinnati, Ohio Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q ... Built in the

SECRETARIA MUNICIPAL DE EDUCAÇÃO Q Q Q Q Q Q Q Q Q …

q. q. q. q. q. q. q. q. q. q. q. q. q. q. q. q. q. q. q. q. q. q. q. q. q. q. q. q. q. q. q. q. q. q. q. q. q. q. q. q. q. q. q. q. q. q. q. q. q. q © Lloyd s Register Group

Résolang - sites.univ-lyon2.frsites.univ-lyon2.fr/resolang/download/RL04/RL04-Vahi.pdf · 2013. 7. 4. · HaCènE rYaD BEnmanSour ... moHamED miliani De l’utilisation du questionnaire

% p c ) q p õ' õ' - q ! ) / q + q - q % q q % q ) c % q ...€¦ · q q V q ± q p q q q q q c q R q V q q q q q q Y q q q q p V q q q ± V q q q ² q q ± R ) GK?PxI C; FED=7 KD

WASHINGTON CO. Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!( !(!(!(! Swa n Dr A mber T r R emin

1 2 L AZ-02 P Blažejov S Q P-D P-E · q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q

ML with Tensorflow - GitHub Pageshunkim.github.io/ml/lec0.pdf · Goals •Basic understanding of machine learning algorithms Linear regression, Logistic regression (classiﬁcation)-Neural

q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q ...This trail is not intended as a comprehensive review of the urban geology of the City of Perth in its widest sense. Whilst

Lecture 7: DQN - GitHub Pageshunkim.github.io/ml/RL/rl07.pdf · Lecture 7: DQN Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim Q-function Approximation:

R L `R R - Florida State Universityxyuan/paper/98dissertation.pdf · ike]\ z#m1s+iuw&mocgbae q q q q qtq q q q q q qtq q q q q q q qtq q q q q q qtq q q q q q qtq q q q q a iyiyi

ML with Tensorflow-new - GitHub Pageshunkim.github.io/ml/lec12.pdf · e.g. Machine Translation seq of words -> seq of words. Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

EGZAMIN POTWIERDZAJĄCY KWALIFIKACJE W ZAWODZIE › zawodowy › rl04-2020-styczen-egzamin-zawodow… · EGZAMIN POTWIERDZAJĄCY KWALIFIKACJE W ZAWODZIE Rok 2020 CZĘŚĆ PISEMNA

Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q ... · Q r Qæ Ql g»WÎWê _ Q[ zb Qv6s^gªur^¶ QS6a l^ QnÇR m l~ Q x z^ Qm"Qûz^_v ] Q wszYâkF QcJa a ... POE EH Tamper

Q Q Q Q Q c 113 114 Q Q Q Q Q Q Q Q Q Q Q Q Q

OVERVIEW Comparison of GRACE Monthly Estimates with ...Monthly GRACE gravity values were derived from the CSR-RL04 spherical harmonic monthly models for period of April 2002 to June

1 · tgygt [;ç) þ&& z+»;w;w tq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 4 í;ç*Ô)Ä;w$¢q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q

RRO&RUSRUDWLRQ...6 Max RPM Al Cu Ag Plasti c Plex i 532 15.000 Q Q Q Q Q Q 535 15.000 Q Q Q 536 15.000 Q Q Q 537 15.000 Q Q Q 538 20.000 Q 540 35.000 402 Q Q Q Q Q Q Q Q Q Q Q 542

P LANX ERDO CIÓ MU do Concello de Portomarín · q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q

:%q q q 1 :%q q q 1 Q5)q q q q q :%q q q 1

greenwaycollab.com€¦ · q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q

Lab 6-2: Q Network for Cart Pole - GitHub Pageshunkim.github.io/ml/RL/rl06-l2.pdf · 2017-10-02 · import numpy as np import tensor flow as tf import gym gym. make( CartPole—vØ

Résolang - sites.univ-lyon2.frsites.univ-lyon2.fr/resolang/download/RL04/RL04-Bentabet.pdf · 2013. 7. 4. · naBila BESTanDJi Représentations et implicite dans le discours journalistique

- EO... · kc3q q q "' f q "% q) % * %/q" q)* * q) & * %/q" q ) q **"& /q % q %" q)+ * q) (* %/q" q* q * % "%q) (* %/q" q % + *+% q

Rather Be - imazeki.de · Rather Be Clean Bandit Arr. Ingo Wilms q q q q q q q q q q q q Violin 1 q q q q q = 132 q q q q q q q q q q q q q q q q Violin 2 q q Violoncello q Viola

dehumidifiers.dehumidifiercorp.com · "@fda6g5f;a@ q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 0 3dd3@fk q q q

Lab 6-1: Q Network - GitHub Pageshunkim.github.io/ml/RL/rl06-l1.pdfPlaying Atari with Deep Reinforcement Learning - University of Toronto by V Mnih et al. Algorithm Playing Atari with

ML with Tensorflow lab - GitHub Pageshunkim.github.io/ml/lab12.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 -21 8 Feb 2016 Character-level language model example Vocabulary:

ML with Tensorflow - GitHub Pageshunkim.github.io/ml/lec5.pdf · Linear regression •We know Y is 0 or

Thank you for downloading this file, I hope it will be ... · C Major Diatonic scale q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q Seconds q q q q

q q q q q q q 09 - Bihar