transfer learning in sequential decision problems: a hierarchical bayesian approach

24
Transfer Learning in Sequential Decision Problems: A Hierarchical Bayesian Approach Aaron Wilson, Alan Fern, Prasad Tadepalli School of EECS Oregon State University

Upload: mac

Post on 18-Mar-2016

37 views

Category:

Documents


1 download

DESCRIPTION

Transfer Learning in Sequential Decision Problems: A Hierarchical Bayesian Approach. Aaron Wilson, Alan Fern, Prasad Tadepalli School of EECS Oregon State University. Markov Decision Processes. MDP M : R : Policy Seek optimal policy:. Environment. Agent. Environment M1. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Transfer Learning in Sequential Decision Problems: A Hierarchical Bayesian Approach

Transfer Learning in Sequential Decision Problems:A Hierarchical Bayesian Approach

Aaron Wilson, Alan Fern, Prasad TadepalliSchool of EECS

Oregon State University

Page 2: Transfer Learning in Sequential Decision Problems: A Hierarchical Bayesian Approach

Markov Decision Processes

MDP M : R :

Policy

Seek optimal policy:

', sas Environment

Agent

tt sr , taas,

*

),,,( RMAS

as :

]|))(,([)(0

Tt

ttt ssREsV

Page 3: Transfer Learning in Sequential Decision Problems: A Hierarchical Bayesian Approach

Multi Task Reinforcement Learning (MTRL) Given: A sequence of Markov Decision Processes drawn

from an unknown distribution D.

Goal: Leverage past experience to improve performance on new MDPs drawn from D.

DEnvironment M1 Environment M2 Environment Mn

Page 4: Transfer Learning in Sequential Decision Problems: A Hierarchical Bayesian Approach

MTRL Problem

Tasks have hierarchical relationships. Set of classes (unknown to the agent). Natural means of transfer (class discovery).

1 2

0G

111 ,, RM 222 ,, RM nnn RM ,,

Page 5: Transfer Learning in Sequential Decision Problems: A Hierarchical Bayesian Approach

Hierarchical Bayesian Modeling

c

c

0G

,,RM

N

Foundation: Dirichlet Process Models Unknown number of classes. Discover hierarchical structure.

Explicit formulation of Uncertainty Adapt machinery to the RL setting. Well justified transfer for RL problems.

Page 6: Transfer Learning in Sequential Decision Problems: A Hierarchical Bayesian Approach

Basic Hierarchical Transfer Process111 ,, RM 222 ,, RM nnn RM ,,

Process Inference

)ˆ,,,|ˆ,ˆ,ˆ( 11 HsasRMP ttt

tsSelect Actions(Bayesian RL)

NewTask

ta

1 2

0G

111 ,, RM 222 ,, RM nnn RM ,,

Compute Posterior 1 2

0G

111 ,, RM 222 ,, RM nnn RM ,,

1 2

0G

H

Select Best Hierarchy

Page 7: Transfer Learning in Sequential Decision Problems: A Hierarchical Bayesian Approach

Model-Based Multi-Task RL Prior model for domain models. Action selection:

Thompson sampling Planning

Policy-Based Multi-Task RL Prior for policy parameters. Action selection:

Bayesian Policy Search algorithm.

Hierarchical Bayesian Transfer for RL

),,,|ˆ,ˆ( 11 HsasRMP ttt

2

11ˆ,ˆ

nn RM

new1

0G

2

),,,|ˆ( 11 HsasP ttt

2

1ˆ n

new1

0G

2

Page 8: Transfer Learning in Sequential Decision Problems: A Hierarchical Bayesian Approach

Model-Based MTRL Explicitly Model the Generative Process D

Hierarchy represents classes of MDPs.

D1 2

0G

22 ,RM11,RM 33,RM

Class Prior

44 ,RM

Estimate D

11,RM 22 ,RM nn RM ,

Page 9: Transfer Learning in Sequential Decision Problems: A Hierarchical Bayesian Approach

Action Selection: Exploit estimate of D

Exploit the refined prior (class information). Sample the MDPs using Thompson Sampling. Plan with the sampled model (Value Iteration).

Dts

)ˆ,,,|ˆ,ˆ( 0:0:0: DrasRMP ttt

Compute Posterior

)ˆ,ˆ(* RMPlan

RM ˆ,ˆ

NewTask

ta

Page 10: Transfer Learning in Sequential Decision Problems: A Hierarchical Bayesian Approach

Domain 1

State is a bit vector:

True reward function: Set of 20 test maps.

0S

],....,,...,,,...,[ ,,,1,,1 rcdcducu bbbbbs State

),(~ 2swNr

i

0G

jM ),(~ 2cNw),,,(~ 2

0 mNIWGi

Page 11: Transfer Learning in Sequential Decision Problems: A Hierarchical Bayesian Approach

Domain 1

No Transfer

16 previous tasks

Page 12: Transfer Learning in Sequential Decision Problems: A Hierarchical Bayesian Approach

Policy-Based MTRL

Policy prior. Infer policy components.

Hierarchy represents reusable policy components.

H1 2

0G

21 3

Class Prior

4

Estimate H

11, 22 , nn ,

Page 13: Transfer Learning in Sequential Decision Problems: A Hierarchical Bayesian Approach

Consider Wargus RTS Multiple Unit types. Units fulfill tactical roles. Roles are useful in

multiple maps. Simple->hard instances

Hierarchical policy prior. Facilitate reuse of roles.

Page 14: Transfer Learning in Sequential Decision Problems: A Hierarchical Bayesian Approach

Role Based Policies Set of Roles.

Vectors of policy parameters. Who to attack.

Set of role assignments.

A strategy for assigning agents to roles.

Assignment depends on state features. Executing role-based policy

1. Make the assignment 2. Each agent selects action

),...,( 1 k

kcccc ju ,..,1|],...,[ 1

),|( scPu

cuu usaPcsaP ),|(),|(

Page 15: Transfer Learning in Sequential Decision Problems: A Hierarchical Bayesian Approach

Transfer of Role-Based Policies Bayesian Policy Search

Learns Individual Role parameters. Role assignment function. Assignments of agents to roles.

Sample role-based policies Construct an artificial distribution [Hoffman

et. al. NIPS 2007, Muller Bayes Stats.1999]

Search using stochastic simulation

Model free.

Bayesian Policy Search

ulatorNewTaskSim

H1 2

0G

)()()()|()()( PVPdPRq

Page 16: Transfer Learning in Sequential Decision Problems: A Hierarchical Bayesian Approach

Experiments

Tactical battles in Wargus

Transfer given expert examples.

Learning without expert examples.

Page 17: Transfer Learning in Sequential Decision Problems: A Hierarchical Bayesian Approach

Transfer from expert play.

Page 18: Transfer Learning in Sequential Decision Problems: A Hierarchical Bayesian Approach

Transfer from self play Use BPS on Training Map 1. Transfer to new map.

Page 19: Transfer Learning in Sequential Decision Problems: A Hierarchical Bayesian Approach

Conclusion

Hierarchical Bayesian Modeling for RL Transfer Model-Based MTRL

Learn classes of domain models. Transfer: Improved priors for model-based Bayesian RL.

Policy-Based MTRL Learn re-usable policies. Transfer: Recombine learned policy components in new tasks. Solved tactical games in Wargus

Page 20: Transfer Learning in Sequential Decision Problems: A Hierarchical Bayesian Approach

Thank You

Page 21: Transfer Learning in Sequential Decision Problems: A Hierarchical Bayesian Approach

Outline Multi-Task Reinforcement Learning (RL).

Markov Decision Processes. Multi-task RL setting

Policy-Based Multi-task RL Discover classes of policy components. Bayesian Policy Search Algorithm.

Conclusion

Page 22: Transfer Learning in Sequential Decision Problems: A Hierarchical Bayesian Approach

Policy-Based MTRL Observed property:

Bags of trajectories.

Transfer: Classes of policy components

Means of exploiting transferred information: Recombine existing components in new tasks.

Consequence: Components reused to learn hard tasks.

Page 23: Transfer Learning in Sequential Decision Problems: A Hierarchical Bayesian Approach

Outline

Markov Decision Processes Bayesian Model Based Reinforcement Learning Multi Task Reinforcement Learning (MTRL) Modeling the MTRL Problem MTRL Transfer Algorithm

Estimating parameters of the generative process. Action Selection.

Results Conclusion

Page 24: Transfer Learning in Sequential Decision Problems: A Hierarchical Bayesian Approach

Bayesian Model Based RL

Given prior: Plan using updated model.

1. Most work uses uninformed priors.

2. Selection of prior not supported by data.

3. Priors do not facilitate transfer.

)(MPEnvironment

ts ta

)),((max MsVa tat

)(MP

),,|( 11 ttt sasMP