reinforcement learning michael roberts with material from: reinforcement learning: an introduction...

Reinforcement LearningMichael Roberts

With Material From: Reinforcement Learning: An Introduction

Sutton & Barto (1998)

What is RL?

• Trial & error learning– without model– with model

• Structure

RL vs. Supervised Learning

• Evaluative vs. Instructional feedback

• Role of exploration

• On-line performance

K-armed Bandit Problem

Actions

Average Rewards

0, 0, 5, 10, 35

5, 10, -15, -15, -10

K-armed Bandit Cont.

• Greedy exploration• ε-greedy • Softmax

Average Reward:

Incremental formula:

where: α = 1 / (k+1)

Probability of choosing action a:

More General Problems

• More than one state• Delayed rewards

• Markov Decision Process (MDP)– Set of states – Set of actions– Reward function– State transition function

• Table or Function Approximation

Example: Recycling Robot

Recycling Robot: Transition Graph

Dynamic Programming

Backup Diagram

.25.25.25

.5.5.3.7.6.4

Rewards 10 5 200 200 -10 1000

Dynamic Programming:Optimal Policy

Backup for Optimal Policy

Performance Metrics

• Eventual convergence to optimality

• Speed of convergence to optimality

• Regret

(Kaelbling, L., Littman, M., & Moore, A. 1996)

Gridworld Example

Initialize V arbitrarily, e.g. , for all

Repeat

For each

until (a small positive number)

Output a deterministic policy, such that:

Temporal Difference Learning

• RL without a model• Issue of: temporal credit assignment• Bootstraps like DP

• TD(0):

TD Learning

• Again, TD(0) =

TD(λ) =

where e is called an eligibility trace

Backup Diagram for TD(λ)

TD-Gammon (Tesauro)

Additional Work

• POMDP’s

• Macros

• Multi-agent rl

• Multiple reward structures

reinforcement learning michael roberts with material from: reinforcement learning: an introduction...

Documents

r. s. sutton and a. g. barto: reinforcement learning: an...

an adaptive dynamic programming algorithm for a stochastic...

autonomous learning laboratory – department of computer...

m. tech programme in robotics and artificial intelligence...

reinforcement learning: an introduction · 2017-03-20 · i...

deep learning for reinforcement learning in · pdf filedeep...

cooperative inverse reinforcement learning...cooperative...

multi-objective reinforcement learning using sets of pareto...

reinforcement learning -...

reinforcement learning das reinforcement learning-problem...

reinforcement learning for learning rate control ·...

reinforcement learning - 4. model-free reinforcement...

from reinforcement learning to deep reinforcement...

reinforcement learning for robocup ... - richard s....

reinforcement learning & apprenticeship learning

generalization in reinforcement learning: successful...

reinforcement learning: learning to get what you want......

model minimization in hierarchical reinforcement learning...

eick: reinforcement learning. reinforcement learning...

safe policy search for lifelong reinforcement learning...