exploiting coordination locales in dispomdps via social model shaping

22
Exploiting Coordination Locales in DisPOMDPs via Social Model Shaping Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe

Upload: sydney

Post on 25-Feb-2016

27 views

Category:

Documents


0 download

DESCRIPTION

Exploiting Coordination Locales in DisPOMDPs via Social Model Shaping. Pradeep Varakantham Singapore Management University. Joint work with J.Y.Kwak, M.Taylor , J. Marecki, P. Scerri, M.Tambe. Motivating Domains. Sensor Networks. Disaster Rescue. Characteristics of Domains: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Exploiting Coordination Locales in  DisPOMDPs  via Social Model Shaping

Exploiting Coordination Locales in DisPOMDPs via Social Model Shaping

Pradeep Varakantham Singapore Management University

Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe

Page 2: Exploiting Coordination Locales in  DisPOMDPs  via Social Model Shaping

Motivating Domains

Disaster RescueSensor Networks

Characteristics of Domains: Uncertainty Coordinating multiple agents Sequential decision making

Page 3: Exploiting Coordination Locales in  DisPOMDPs  via Social Model Shaping

Meeting the challengesProblem:

Multiple agents coordinating to perform multiple tasks in presence of uncertainty

Sol: Represent as Distributed POMDPs and solveNEXP Complete for optimal solutionApproximate algorithm to dynamically exploit

structure in interactionsResult: Vast improvement in performance over

existing algorithms

Page 4: Exploiting Coordination Locales in  DisPOMDPs  via Social Model Shaping

Outline

Illustrative Domain

Model

Approach: Exploit dynamic structure in interactions

Results

Page 5: Exploiting Coordination Locales in  DisPOMDPs  via Social Model Shaping

Illustrative Domain Multiple types of

robots Uncertainty in

movements Reward

Saving victims Collisions Clearing debris

Maximize expected joint reward

Page 6: Exploiting Coordination Locales in  DisPOMDPs  via Social Model Shaping

ModelDisPOMDPs with Coordination Locales, DPCL

Joint model: <S, A, Ω, P, R, O, Ag>Global state represents completion of tasksAgents independent except in coordination locales,

CLsTwo types of CLs:

Same time CL (Ex: Agents colliding with each other)

Future time CL (Ex: Cleaner robot cleaning the debris assists rescue robot in reaching the goal)

Individual observability

Page 7: Exploiting Coordination Locales in  DisPOMDPs  via Social Model Shaping

Solving DPCLs with TREMORTeams REshaping of MOdels for Rapid

execution

Two steps:1. Branch and Bound search

MDP based heuristics

2. Task Assignment evaluation By computing policies for every agentPerform only joint policy computation at CLs

Page 8: Exploiting Coordination Locales in  DisPOMDPs  via Social Model Shaping

1. Branch and Bound search

Page 9: Exploiting Coordination Locales in  DisPOMDPs  via Social Model Shaping

2. Task Assignment EvaluationUntil convergence of policies or

maximum iterations:1)Solve individual POMDPs2)Identify potential coordination locales3)Based on type and value of

coordination :Shape P and R of relevant individual agents

Capture interactionsEncourage/Discourage interactions

4)Go to step 1

Page 10: Exploiting Coordination Locales in  DisPOMDPs  via Social Model Shaping

Identifying potential CLsCL = <State, Action>Probability of CL occurring at a time step, T

Given starting beliefStandard belief update given policy

Policy over belief states

Probability of observing w, in belief state “b”

Updating “b”

Page 11: Exploiting Coordination Locales in  DisPOMDPs  via Social Model Shaping

Type of CLSTCL, if there exists “s” and “a” for which

Transition/Reward function not decomposable, P(s,a,s’) ≠ Π1≤i≤N P((sg,si),ai,(sg’,si’)) OR R(s,a,s’) ≠ Σ1≤i≤N R((sg,si),ai,(sg’,si’))

FTCL, Completion of task (global state) by an agent at

t’ affects transitions/rewards of other agents at t

Page 12: Exploiting Coordination Locales in  DisPOMDPs  via Social Model Shaping

Shaping Model (STCL)Shaping transition function

Shaping reward function

Joint transition probability when CL occursNew transition

probability for agent “i”

Page 13: Exploiting Coordination Locales in  DisPOMDPs  via Social Model Shaping

ResultsBenchmark Algorithms

Independent POMDPsMemory Bounded Dynamic Programming (MBDP)

CriterionDecision qualityRun-time

Parameters: (i) agents; (ii) CLs; (iii) states; (iv) horizon

Page 14: Exploiting Coordination Locales in  DisPOMDPs  via Social Model Shaping

State space

Page 15: Exploiting Coordination Locales in  DisPOMDPs  via Social Model Shaping

Agents

Page 16: Exploiting Coordination Locales in  DisPOMDPs  via Social Model Shaping

Coordination Locales

Page 17: Exploiting Coordination Locales in  DisPOMDPs  via Social Model Shaping

Time Horizon

Page 18: Exploiting Coordination Locales in  DisPOMDPs  via Social Model Shaping

Related workExisting Research

DEC-MDPs Assuming individual or collective full observability Task allocation and dependencies as input

DEC-POMDPs JESP MBDP Exploiting independence in

transition/reward/observation.Model Shaping

Guestrin and Gordon, 2002

Page 19: Exploiting Coordination Locales in  DisPOMDPs  via Social Model Shaping

ConclusionDPCL, a specialization of Distributed POMDPs

TREMOR exploits presence of few CLs in domains

TREMOR depends on single agent POMDP solvers

Results: TREMOR outperformed DisPOMDP algorithms,

except in tightly coupled small problems

Page 20: Exploiting Coordination Locales in  DisPOMDPs  via Social Model Shaping

Questions?

Page 21: Exploiting Coordination Locales in  DisPOMDPs  via Social Model Shaping

Same Time CL (STCL)There is an STCL, if

Transition function not decomposable, OR P(s,a,s’) ≠ Π1≤i≤N P((sg,si),ai,(sg’,si’))

Observation function not decomposable, OR O(s’,a,o) ≠ Π 1≤i≤N O(oi,ai,(sg’,si’))

Reward function not decomposable R(s,a,s’) ≠ Σ1≤i≤N R((sg,si),ai,(sg’,si’))

Ex: Two robots colliding in a narrow corridor

Page 22: Exploiting Coordination Locales in  DisPOMDPs  via Social Model Shaping

Future Time CLActions of one agent at “ t’ ” can affect

transitions OR observations OR rewards of other agents at “ t ” P((st

g,sti),at

i,(stg’,st

i’)|ajt’ ) ≠ P((st

g,sti),at

i,(stg’,st

i’)) , ¥ t’ < t

R((stg,st

i),ati,(st

g’,sti’)|aj

t’ ) ≠ R((stg,st

i),ati,(st

g’,sti’)) , ¥ t’

< t O(wt

i,ati,(st

g’,sti’)|aj

t’ ) ≠ O(wti,at

i,(stg’,st

i’)) , ¥ t’ < t

Ex: Clearing of debris assists rescue robots in getting to victims faster