rl2: q(s,a) easily gets us u(s) and pi(s) leftarrow with alpha above = move towards target, weighted...

Upload: gilbert-cunningham

Post on 18-Jan-2016

216 views

Category:

Documents

0 download

Report

Download

Tags:

Embed Size (px):

TRANSCRIPT

Page 1: RL2: Q(s,a) easily gets us U(s) and pi(s) Leftarrow with alpha above = move towards target, weighted by learning rate Sum of alphas is infinity, sum of

Page 2: RL2: Q(s,a) easily gets us U(s) and pi(s) Leftarrow with alpha above = move towards target, weighted by learning rate Sum of alphas is infinity, sum of

Page 3: RL2: Q(s,a) easily gets us U(s) and pi(s) Leftarrow with alpha above = move towards target, weighted by learning rate Sum of alphas is infinity, sum of

Page 4: RL2: Q(s,a) easily gets us U(s) and pi(s) Leftarrow with alpha above = move towards target, weighted by learning rate Sum of alphas is infinity, sum of

• t, r -> planner -> policy Dyn. Prog.• s, a, s’, r -> learner -> policy Monte Carlo• s, a, s’, r -> modeler -> t,r• model -> simulate -> s, a, s’, r

• Q-Learning ?• TD ?

Page 5: RL2: Q(s,a) easily gets us U(s) and pi(s) Leftarrow with alpha above = move towards target, weighted by learning rate Sum of alphas is infinity, sum of

Page 6: RL2: Q(s,a) easily gets us U(s) and pi(s) Leftarrow with alpha above = move towards target, weighted by learning rate Sum of alphas is infinity, sum of

Page 7: RL2: Q(s,a) easily gets us U(s) and pi(s) Leftarrow with alpha above = move towards target, weighted by learning rate Sum of alphas is infinity, sum of

Page 8: RL2: Q(s,a) easily gets us U(s) and pi(s) Leftarrow with alpha above = move towards target, weighted by learning rate Sum of alphas is infinity, sum of

Page 9: RL2: Q(s,a) easily gets us U(s) and pi(s) Leftarrow with alpha above = move towards target, weighted by learning rate Sum of alphas is infinity, sum of

• DP, MC, DP• Model, online/offline update, bootstrapping,

on-policy/off-policy • Batch updating vs. online updating

Page 10: RL2: Q(s,a) easily gets us U(s) and pi(s) Leftarrow with alpha above = move towards target, weighted by learning rate Sum of alphas is infinity, sum of

Ex. 6.4

• V(B) = ¾

Page 11: RL2: Q(s,a) easily gets us U(s) and pi(s) Leftarrow with alpha above = move towards target, weighted by learning rate Sum of alphas is infinity, sum of

Ex. 6.4

• V(B) = ¾• V(A) = 0? Or ¾?

Page 12: RL2: Q(s,a) easily gets us U(s) and pi(s) Leftarrow with alpha above = move towards target, weighted by learning rate Sum of alphas is infinity, sum of

Page 13: RL2: Q(s,a) easily gets us U(s) and pi(s) Leftarrow with alpha above = move towards target, weighted by learning rate Sum of alphas is infinity, sum of

• 4 is terminal state• V(3) = 0.5• TD(0) here is better than MC for V(2). Why?

Page 14: RL2: Q(s,a) easily gets us U(s) and pi(s) Leftarrow with alpha above = move towards target, weighted by learning rate Sum of alphas is infinity, sum of

• 4 is terminal state• V(3) = 0.5• TD(0) here is better than MC. Why?

• Visit 2 on the kth time, state 3 visited 10k times• Variance for MC will be much higher than TD(0) because of

bootstrapping

Page 15: RL2: Q(s,a) easily gets us U(s) and pi(s) Leftarrow with alpha above = move towards target, weighted by learning rate Sum of alphas is infinity, sum of

• 4 is terminal state• V(3) = 0.5

• Change so that R(3,a,4) was deterministic• Now, MC would be faster

Page 16: RL2: Q(s,a) easily gets us U(s) and pi(s) Leftarrow with alpha above = move towards target, weighted by learning rate Sum of alphas is infinity, sum of

• What happened in the 1st episode?

Page 17: RL2: Q(s,a) easily gets us U(s) and pi(s) Leftarrow with alpha above = move towards target, weighted by learning rate Sum of alphas is infinity, sum of

Q-Learning

Page 18: RL2: Q(s,a) easily gets us U(s) and pi(s) Leftarrow with alpha above = move towards target, weighted by learning rate Sum of alphas is infinity, sum of

Page 19: RL2: Q(s,a) easily gets us U(s) and pi(s) Leftarrow with alpha above = move towards target, weighted by learning rate Sum of alphas is infinity, sum of

• Decaying learning rate?• Decaying exploration rate?

r.LiNK Video-inserter RL2-NBT for BMW vehicles with ... · Version 13.04.2016 RL2-NBT g e 4 Requirements Vehicle BMW 1series (F20/F21), 3series (F30/F31/F32/F33); X3, all other F-series

r.LiNK Video-inserter RL2-DVD900 for Opel, Chevrolet and ... · Version 01.07.2016 r.LiNK Video-inserter RL2-DVD900 for Opel, Chevrolet and Buick with CD500, DVD600, DVD800, DVD900,

JACKSON COUNTY - storage.googleapis.com · michael. jackson county drainage project. pay item no. unit pay item plan final roadway items. s-200-a mobilization lump sum lump sum s-201-a

Zooppa pres 061213 marcom13 rl2

Video-Inserter RL2-MBN51 for Mercedes Benz vehicles with ... · RL2-MBN51 g e 11 2.6.1 After-Market Rear View Camera Some vehicles have a different reverse gear code on the Can-Bus,

Zero-Sum Stochastic Games - staff.uz.zgora.plstaff.uz.zgora.pl/anowak/sg_zs.pdf · Zero-Sum Stochastic Games Anna Ja skiewicz Andrzej S. Nowak ... Gilette (1957) studied a similar

r.LiNK Video-inserter RL2-MIB-3 for VAG vehicles with MIB … · 2016. 6. 17. · Version 13.04.2016. r.LiNK Video-inserter. RL2-MIB-3. for VAG vehicles with MIB High/Standard infotainment

Modul RL2 Lengkap

CO NTR O LADO R MAS TER PIR20B PIM091205 m 0912 - cont rola dora sla v e + 12 v gnd a b b 2 b 1 rl1 -cm rl1 -nf rl1 -na rl2 -cm rl2 -nf rl2 -na s 2 s 1 + 5 v g n d bz d1 d0 gnd 12v

Start - Kanwal Rekhivijaya/ssrvm/dokuwiki/media/s...Print Sum Sum = A + B Flowchart - How to find sum of two numbers 92 Moz: Now consider the following problem and draw the flowchart

Sum of and Sum of Sum of Sum of due for Sales proceed for ... · harish chandra maurya s p syed imran hasan late syedhasan ram asre dujai jai kumar singh ram sagar suneeta jaiswal

On the R ange M aximum-Sum S egment Q uery Problem

Aspirasi Sum Sum Tulang

What is a Condensate of Cmaps?. The S pre and S post Cmaps: “S” like Sum of propositions

r.LiNK Video-inserter RL2-MMI3G-Q3 RL2-MMI3G-GW …Micro-Fit connector of the MMI-box. Note: Check LEDs on video-interface after reconnecting the battery, one must be on. Connect the

Sum of and Sum of Sum of Sum of due for Sales proceed for ... · ajay aditya s k vicky sehgal virender kumar parvinder singh takkar bhupender singh manoj kumar om hira lal yadav ram

Sum of and Sum of Sum of Sum of due for Sales proceed for ... · d suresh damodaran aswathanarayak n kandavara chikkaiah s gireesh sadashivamurthy vinod v joshi varadaraj joshi amitha

Sum of and Sum of Sum of Sum of due for Sales proceed for

Choi 2012 What is a Series? A series is the sum of terms of a sequence. The sum of the first n terms in a sequence is denoted as S n. S 5 the sum of

CASE LSE Case Study Interim Report FINAL 171213-RL2

Choi Recall: What is a Series? A series is the sum of terms of a sequence. The sum of the first n terms in a sequence is denoted as S n. S 5 the sum

ö’ s Sum Dis, Sum Dat Papa’s Fresh Market HAPPY HOUR P a

BUDGET PREPARATION 'N'ORJ(S~i SUM Budget Preparation 20

r.LiNK Video-Einspeiser RL2-MMI3G-Q3 RL2-MMI3G-GW Für Audi ... · VW Touareg ab Modell 2011 RNS850 Einschränkungen Nur Video Das Interface speist NUR Video-Signale in das Infotainment

RL1 RL2 Somfy SAS dans un souci constant d’évolution et … · 2015. 6. 15. · röle 1(RL1)’i kumanda eder. P2 elektrik düğmesi röle 2 (RL2)’yi kumanda eder. S1 köprüsüza:

r.LiNK Video-Einspeiser RL2-MMI3G-Q3 RL2-MMI3G-GW Für Audi ... · Stecker der MMI-Box verbinden. Hinweis: Nach dem Wiederanschluss der Batterie die LEDs des Video-Interface überprüfen,

r.LiNK Video-inserter CI-RL2-MMI3G-Q3 CI-RL2-MMI3G-GW

Abbreviations in Note Taking Math Symbols –s=Sum –f=Frequency

Modul praktikum rl2

RL2, 1996 Champion Road Ann Report 2009 · Title: RL2, 1996 Champion Road Ann Report 2009.doc Author: Tom Created Date: 1/27/2009 4:13:49 PM

Geometry Section 5.1 Polygons their angle measures · PDF file05-12-2016 · SUM Interior ∠s = (n – ___ ) _____ SUM Exterior ∠s = _____ 2) The sum of the interior angles of a

SAMPLE R‰SUM‰S COVER LETTERS - Porterville College

FS English - RL2 - Practice Test 1 - Answer Booklet

Sum Sequences Moduloronspubs/18_03_balanced.pdfSum Sequences Modulo n Fan Chungy Jon Folkman Ron Graham Abstract A sum sequence modulo nis a sequence S = (s 1;s 2;:::;s d) of elements

rl2: q(s,a) easily gets us u(s) and pi(s) leftarrow with alpha above = move towards target, weighted...

Documents

r qlearning

high learning rates

learning ratesum of

terminal statev3

mc vs td

r planner

r learner

r modeler