# recent advances in hierarchical reinforcement learning â€¢has a goal (planning) ......

Post on 18-Oct-2020

0 views

Embed Size (px)

TRANSCRIPT

PIGML Seminar - AirLab

Recent Advances in Hierarchical Reinforcement Learning

Authors: Andrew Barto

Sridhar Mahadevan

Speaker: Alessandro Lazaric

PIGML Seminar - AirLab

Outline

Introduction to Reinforcement Learning • Reinforcement Learning Inspirations and Foundations • Markov Decision Processes (MDPs) and Q-learning

Hierarchical Reinforcement Learning • From MDPs to SMDPs • Option Framework • MAXQ Value Function Decomposition • Other Approaches to Hierarchical Reinforcement

Learning • Future/Current/Past Research

PIGML Seminar - AirLab

Outline

Introduction to Reinforcement Learning • Reinforcement Learning Inspirations and Foundations • Markov Decision Processes (MDPs) and Q-learning

Hierarchical Reinforcement Learning • From MDPs to SMDPs • Option Framework • MAXQ Value Function Decomposition • Other Approaches to Hierarchical Reinforcement

Learning • Future/Current/Past Research

PIGML Seminar - AirLab

RL as… Animal Psychology

Of several responses [actions] made to the same situation, those which are followed by satisfaction to the animal will be more firmly connected with the situation, so that, when it recurs, they will be more likely to recur; those which are followed by discomfort to the animal will have their connections with that situation weakened, so that, when it recurs, they will be less likely to occur. The greater the satisfaction or discomfort, the greater the strengthening or weakening of the bond. (Thorndike, 1911, p. 244)

PIGML Seminar - AirLab

RL as… Neuroscience

Much evidence suggests that dopamine cells play an important role in reinforcement and action learning

Electrophysiological studies support a theory that dopamine cells signal a global prediction error for summed future reinforcement in appetitive conditioning tasks in the form of a temporal difference (TD) prediction error term

Reinforcement Signal R

Kakade & Dayan (2002)

PIGML Seminar - AirLab

RL as… Artificial Intelligence

An artificial agent (either software or hardware) is placed in an environment

The agent • perceives the state of the environment • acts on the environment through

actions • has a goal (planning)

States S Actions A

Environment

Agent

States

Actions

PIGML Seminar - AirLab

RL as… Artificial Intelligence

An artificial agent (either software or hardware) is placed in an environment

The agent • perceives the state of the environment • acts on the environment through

actions • has a goal (planning) • receives rewards from a critic

States S Actions A Reward R(s,a)

Environment

Agent

Critic

States

Actions

Reward

PIGML Seminar - AirLab

RL as… Optimal Control

A control system has sensor (i.e., states), actuators (i.e., actions) and costs (i.e., rewards)

The environment is a dynamical stochastic system

Often, the system can be formalized as Markov Decision Process

Optimal control

PIGML Seminar - AirLab

RL as… Discrete Time Differential Equations

Value function

Action value function

Bellman equations

Bellman (1957a)

PIGML Seminar - AirLab

RL as… Operations Research

Optimal functions

Dynamic Programming (given P and R)

Bellman (1957b)

PIGML Seminar - AirLab

RL as… a Milkshake

Operations Research

Bellman Equations

Animal Psychology

Optimal Control

Neuroscience

PIGML Seminar - AirLab

RL as… a Machine Learning Paradigm!

Reinforcement Learning is the most general Machine Learning paradigm

RL is how to map states to actions, so as to maximize a numerical reward in the long run

RL is a multi-step decision-making process (often Markovian)

An RL agent learns through a model- free trial-and-error process

Actions may affect not only the immediate reward but also subsequent rewards (delayed effect)

PIGML Seminar - AirLab

Reinforcement Learning Framework

Markov Decision Process (MDP)

PIGML Seminar - AirLab

Reinforcement Learning Framework

Markov Decision Process (MDP) • Set of states

0 1 2 3 4

5 6 7 8 9

10 11 12 13 14

15 16 17 18 19

20 21 22 23 24

PIGML Seminar - AirLab

Reinforcement Learning Framework

Markov Decision Process (MDP) • Set of states • Set of actions

0 1 2 3 4

5 6 7 8 9

10 11 12 13 14

15 16 17 18 19

20 21 22 23 24

PIGML Seminar - AirLab

Reinforcement Learning Framework

Markov Decision Process (MDP) • Set of states • Set of actions • Transition model

0 1 2 3 4

5 6 7 8 9

10 11 12 13 14

15 16 17 18 19

20 21 22 23 24

PIGML Seminar - AirLab

Reinforcement Learning Framework

Markov Decision Process (MDP) • Set of states • Set of actions • Transition model • Reward function • Discount factor: γ

0 1 2 3 4

5 6 7 8 9

10 11 12 13 14

15 16 17 18 19

20 21 22 23 24

PIGML Seminar - AirLab

Reinforcement Learning Framework

Markov Decision Process (MDP) • Set of states • Set of actions • Transition model • Reward function • Discount factor: γ

Solution of an MDP • Optimal (action) value function

0 1 2 3 4

5 6 7 8 9

10 11 12 13 14

15 16 17 18 19

20 21 22 23 24

PIGML Seminar - AirLab

Reinforcement Learning Framework

Markov Decision Process (MDP) • Set of states • Set of actions • Transition model • Reward function • Discount factor: γ

Solution of an MDP • Optimal (action) value function

• Optimal policy

0 1 2 3 4

5 6 7 8 9

10 11 12 13 14

15 16 17 18 19

20 21 22 23 24

PIGML Seminar - AirLab

Reinforcement Learning: Q-learning

Q-learning

PIGML Seminar - AirLab

An Example of Reinforcement Learning

http://www.fe.dis.titech.ac.jp/~gen/robot/robodemo.html

PIGML Seminar - AirLab

Outline

Introduction to Reinforcement Learning • Reinforcement Learning Inspirations and Foundations • Markov Decision Processes (MDPs) and Q-learning

Hierarchical Reinforcement Learning • From MDPs to SMDPs • Option Framework • MAXQ Value Function Decomposition • Other Approaches to Hierarchical Reinforcement

Learning • Future/Current/Past Research

PIGML Seminar - AirLab

The need for Hierarchical RL

Curse of dimensionality: the application of Reinforcement Learning to the problems with large action and/or state space is infeasible

Abstraction: state and temporal abstractions allow to simplify the problem

Prior knowledge: complex tasks can be often decomposed in a hierarchy of sub-tasks

Solution: sub-tasks can be effectively solved by Reinforcement Learning approaches

Reuse: sub-tasks and abstract actions can be used in different tasks on the same domain

PIGML Seminar - AirLab

Hierarchical Reinforcement Learning

Hierarchical approach to RL is the introduction of temporal abstraction to Reinforcement Learning framework

Temporal abstraction is • Macro-operators • Temporally extended actions • Options • Sub-tasks • Skills • Behaviors • Modes

PIGML Seminar - AirLab

Hierarchical Reinforcement Learning

From MDPs to SMDPs: with temporally extended actions we need to take into account the amount of time passed between decision time instants

Semi-Markov Decision Processes

PIGML Seminar - AirLab

Hierarchical RL Approaches

Options Framework

MAXQ Value Function Decomposition

Hierachies of Abstract Machines

PIGML Seminar - AirLab

Options Framework

An option o is defined as:

PIGML Seminar - AirLab

Options Framework

An option o is defined as:

PIGML Seminar - AirLab

Options Framework

An option o is defined as:

PIGML Seminar - AirLab

Options Framework

An option o is defined as:

PIGML Seminar - AirLab

Options Framework

Between MDPs and SMDPs

Continuous time Discrete events Interval-dependent discount

Discrete time Overlaid discrete events Interval-dependent discount

MDP

SMDP

Options

over MDP

State

Time

Discrete time Homogeneous discount

Sutton (1999)

PIGML Seminar - AirLab

Options Framework

The introduction of options leads to a straightforward redefinition of all the elemen