part 4: monte carlo learning to solve blackjack...r. s. sutton and a. g. barto: reinforcement...

Part 4: Monte Carlo Learning to solve Blackjack Introduction to Reinforcement Learning R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1

Upload: others

Post on 25-Aug-2020

6 views

Category:

Documents

1 download

Report

Download

Embed Size (px):

TRANSCRIPT

Part 4: Monte Carlo Learning to solve Blackjack

Introduction to Reinforcement Learning

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1

Page 2: Part 4: Monte Carlo Learning to solve Blackjack...R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 3 Monte Carlo Policy Evaluation Goal: learn v π Given: some

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 2

Monte Carlo Methods for Solving MDPs

❐ Monte Carlo methods are learning methods! Experience " values, policy

❐ Monte Carlo methods can be used in two ways:! On-line: No model necessary and still attains optimality! Simulated: Planning without a full probability model

❐ Monte Carlo methods learn from complete sample returns! Only defined for episodic

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 3

Monte Carlo Policy Evaluation

❐ Goal: learn vπ❐ Given: some number of episodes under π which contain s❐ Idea: Average returns observed after visits to s

❐ Every-Visit MC: average returns for every time s is visited in an episode

❐ First-visit MC: average returns only for first time s is visited in an episode

❐ Both converge asymptotically to vπ

1 2 3 4 5

Page 4: Part 4: Monte Carlo Learning to solve Blackjack...R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 3 Monte Carlo Policy Evaluation Goal: learn v π Given: some

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 4

First-visit Monte Carlo policy evaluation

Page 5: Part 4: Monte Carlo Learning to solve Blackjack...R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 3 Monte Carlo Policy Evaluation Goal: learn v π Given: some

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 5

Backup diagram for Monte Carlo

❐ Entire episode included❐ Only one choice at each state

❐ Time required to estimate one state does not depend on the total number of states

Page 6: Part 4: Monte Carlo Learning to solve Blackjack...R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 3 Monte Carlo Policy Evaluation Goal: learn v π Given: some

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 6

Blackjack example

❐ Object: Have your card sum be greater than the dealers without exceeding 21.

❐ States (200 of them): ! current sum (12-21)! dealer’s showing card (ace-10)! do I have a useable ace?

❐ Reward: +1 for winning, 0 for a draw, -1 for losing❐ Actions: stick (stop receiving cards), hit (receive another

card)❐ Policy: Stick if my sum is 20 or 21, else hit

Page 7: Part 4: Monte Carlo Learning to solve Blackjack...R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 3 Monte Carlo Policy Evaluation Goal: learn v π Given: some

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 7

Blackjack value functions

Page 8: Part 4: Monte Carlo Learning to solve Blackjack...R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 3 Monte Carlo Policy Evaluation Goal: learn v π Given: some

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 8

Monte Carlo Exploring Starts

Fixed point is optimal policy π*

Now proven (almost)

Page 9: Part 4: Monte Carlo Learning to solve Blackjack...R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 3 Monte Carlo Policy Evaluation Goal: learn v π Given: some

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 9

Blackjack example continued

❐ Exploring starts❐ Initial policy as described before

Page 10: Part 4: Monte Carlo Learning to solve Blackjack...R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 3 Monte Carlo Policy Evaluation Goal: learn v π Given: some

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 10

Monte Carlo Exploring Starts

Fixed point is optimal policy π*

Now proven (almost)

Page 11: Part 4: Monte Carlo Learning to solve Blackjack...R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 3 Monte Carlo Policy Evaluation Goal: learn v π Given: some

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 11

Blackjack example continued

❐ Exploring starts❐ Initial policy as described before

Page 12: Part 4: Monte Carlo Learning to solve Blackjack...R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 3 Monte Carlo Policy Evaluation Goal: learn v π Given: some

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 12

On-policy Monte Carlo Control

)(1

sAε

ε +−

greedy

)(sAε

non-max

❐ On-policy: learn about policy currently executing❐ How do we get rid of exploring starts?

! Need soft policies: π(a|s) > 0 for all s and a! e.g. ε-soft policy:

❐ Similar to GPI: move policy towards greedy policy (i.e. ε-soft)

❐ Converges to best ε-soft policy

Page 13: Part 4: Monte Carlo Learning to solve Blackjack...R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 3 Monte Carlo Policy Evaluation Goal: learn v π Given: some

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 13

Summary for Monte Carlo Methods

❐ Monte Carlo methods learn directly from experience! On-line: No model necessary and still attains optimality! Simulated: No need for a full model

❐ Monte Carlo methods learn from complete sample returns! Only defined for episodic tasks

❐ The Monte Carlo exploring starts algorithm avoids the vexing issue of maintaining exploration

Monte Carlo Methods and Area Estimates - Cornell … · Monte Carlo Methods and Area Estimates CS3220 - Summer 2008 Jonathan Kaldor. Monte Carlo Methods ... as a Monte Carlo simulation

Monte Carlo

Monte Carlo Simulationathreya/Teaching/statistics1/dootika.pdf · Monte Carlo Simulation Monte Carlo methods, or Monte Carlo experiments, are a broad class of computational algorithms

CLASSICAL MONTE CARLO & METROPOLIS ALGORITHMstaff.ustc.edu.cn/~yjdeng/CQMC2012/lecture_notes/part1.pdf · CLASSICAL MONTE CARLO & METROPOLIS ALGORITHM Monte Carlo (MC) simulations

Simulasi Monte Carlo - pertiwimulya.staff.gunadarma.ac.idpertiwimulya.staff.gunadarma.ac.id/...+Simulasi+Monte+Carlo.pdf · APLIKASI SIMULASI MONTE CARLO Contoh: •Sebuah perusahaan

Monte Carlo approach to uncertainty analyses in forestry ... Monte Carlo... · inventory Monte Carlo Small or large ... To do Monte Carlo simulations, ... Monte Carlo simulation #

Quasi-Monte Carlo Variational InferenceQuasi-Monte Carlo Variational Inference 3. Quasi-Monte Carlo Variational Inference In this Section, we introduce Quasi-Monte Carlo Varia-tionalInference(qmcvi),usingrandomizedqmc

Method? Monte Carlo Methods Two basic principles Monte Carlo

Monte Carlo and quasi-Monte Carlo integration

Monte Carlo and Kinetic Monte Carlo Methods – A Tutorial · 2016. 5. 24. · Monte Carlo and Kinetic Monte Carlo Methods – A Tutorial Peter Kratzer Fachbereich Physik and Center

Fatin Sezgin - MCQMC2010 - Monte Carlo and Quasi-Monte Carlo

OPTIONS: A MONTE CARLO APPROACH - camjclub - homeA+Monte+Carlo+Appro… · OPTIONS: A MONTE CARLO APPROACH ... This paper develops a Monte Carlo simulation method for solving option

Quantum Monte Carlo for Electronic Structurekentpr/talks/uc_group_mtg_June2003.pdf · •Real-world Applications •Monte Carlo integration •Variational Monte Carlo •Diffusion

JetForm:836 - mcnp.lanl.gov · 06-7094 Monte Carlo Eigenvalue Calculations Forrest Brown Monte Carlo lectures. 1 Monte Carlo Eigenvalue Calculations Forrest Brown Monte Carlo Codes

Monte Carlo Simulation of Stochastic Processescacs.usc.edu/education/cs596/09Stochastic.pdf · Monte Carlo Simulation of Stochastic Processes MONTE CARLO METHOD • Monte Carlo

Markov Chain Monte Carlo Methods By Marc Sobel. Exploring Monte Carlo A biulding in the country of Monte Carlo:

Monte` Carlo Methods 1 MONTE` CARLO METHODS INTEGRATION and SAMPLING TECHNIQUES

FC1 - Monte Carlo Simulationen - physik.uni-paderborn.de · FC1 - Monte Carlo Simulationen 3 1 Das Monte Carlo Verfahren Monte Carlo ist ein ¨ublicher Name f ¨ur eine große Anzahl

Monte Carlo Simulation Monte Carlo Simulation - University of Florida

Monte Carlo and Kinetic Monte Carlo Methods – A Tutorial€¦ · Monte Carlo and Kinetic Monte Carlo Methods – A Tutorial Peter Kratzer published in Multiscale Simulation Methods

Monte Carlo and quasi-Monte Carlo for image synthesisstatweb.stanford.edu/~owen/pubtalks/egsr2014post.pdf · 2014-06-29 · Monte Carlo and quasi-Monte Carlo for image synthesis Art

Monte Carlo Method - Monte Carlo Simulation · Monte Carlo Method Monte Carlo Simulation Peter Frank Perroni December 1, 2015 Peter Frank Perroni Monte Carlo Method

Monte Carlo Solution for Actuarial Problems - Member · MONTE CARLO SOLUTION FOR ACTUARIAL PROBLEMS ... Monte Carlo simulation is a collection of techniques to extract ... MONTE CARLO

Monte Carlo and Optimization Monte Carlo EM (MCEM) / …Monte Carlo and Optimization Monte Carlo EM (MCEM) / Monte Carlo MLE Simulated Annealing Monte Carlo EM Wei and Tanner (1990),

DerMonte-Carlo- · PDF fileRandomisierteAlgorithmen Monte-Carlo-Algorithmen Las-Vegasvs. Monte-Carlo Anwendungsbeispiel: DieZahlπ Gliederung 1 RandomisierteAlgorithmen 2 Monte-Carlo

Monte Carlo Methods - ethz.ch · 1 Monte Carlo Methods Nicholas Constantine Metropolis Stanislaw Ulam 2 Books about Monte Carlo • M.H. Kalos and P.A. Whitlock: „Monte Carlo Methods“

Monte Carlo Treatment Planning - Radiation Dosimetryradiationdosimetry.org/files/...16-monte-carlo-treatment-planning.pdf · Monte Carlo Treatment Planning An Introduction NEDERLANDSE

Monte Carlo Particle Transport: Algorithm and Performance ... · arise in photon Monte Carlo simulations. Neutron Monte Carlo Transport The Monte Carlo method of simulating particle

Backgammon, Go, - Ferdowsi University of Mashhadfumblog.um.ac.ir/gallery/839/Chapter 06.pdf · R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Monte Carlo

Monte Carlo - ππππ Calculation - HKRPS Basic Course in Monte Carlo... · 2/2/2016 2 Monte Carlo – Sales Forecast 7 Sales Forecast Monte Carlo Simulation 8 Monte Carlo in Radiation

Monte Carlo and quasi-Monte Carlo methods - PKUdsec.pku.edu.cn/~tieli/notes/numer_anal/MCQMC_Caflisch.pdf · MONTE CARLO AND QUASI-MONTE CARLO 3 quasi-random points converges more

Monte Carlo Methods - UNIGE · Monte Carlo Methods Stéphane Paltani What are Monte-Carlo methods? Generation of random variables Markov chains Monte-Carlo Error …

Monte Carlo tools for LHC - INFN Lecce web · Monte Carlo tools for LHC Contents Basics of Monte Carlo methods Monte Carlo programs for particle physics 1. Monte Carlo integrators

Chapter 4 Monte Carlo Methods and Simulation. Monte Carlo Methods