![Page 1: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/1.jpg)
D-TREMOR - AAMAS2011 1
Distributed Model Shaping for Scaling to Decentralized POMDPs
with hundreds of agents
Prasanna Velagapudi
Pradeep Varakantham
Paul Scerri
Katia Sycara
![Page 2: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/2.jpg)
D-TREMOR - AAMAS2011 2
Motivation
• 100s to 1000s of robots, agents, people
• Complex, collaborative tasks• Dynamic, uncertain
environment• Offline planning
Search & Rescue Military C2
Disaster Response
ConvoyPlanning
![Page 3: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/3.jpg)
D-TREMOR - AAMAS2011 3
Motivation
• Exploit three characteristics of these domains1. Explicit Interactions
• Specific combinations of states and actions where effects depend on more than one agent
2. Sparsity of Interactions• Many potential interactions could occur between agents• Only a few will occur in any given solution
3. Distributed Computation• Each agent has access to local computation• A centralized algorithm has access to 1 unit of computation• A distributed algorithm has access to N units of computation
![Page 4: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/4.jpg)
D-TREMOR - AAMAS2011 4
Review: Dec-POMDP
: Joint Transition
: Joint Reward
: Joint Observation
1
2
![Page 5: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/5.jpg)
D-TREMOR - AAMAS2011 5
Distributed POMDP with Coordination Locales
[Varakantham, et al 2009]
CL =Nature of time constraint (e.g. affects only same-time, affects any future-
time)
Relevant region of joint state-action space
Time constraint
![Page 6: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/6.jpg)
D-TREMOR - AAMAS2011 6
CL =
:
:
Distributed POMDP with Coordination Locales
[Varakantham, et al 2009]
![Page 7: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/7.jpg)
D-TREMOR - AAMAS2011 7
Decentralized auction
EVA POMDP solver
Policy sub-sampling and Coordination Locale (CL)
messages
Prioritized/randomized reward and transition shaping
D-TREMOR (extending TREMOR [Varakantham, et al 2009])
Task Allocation
Local Planning
Interaction Exchange
Model Shaping
![Page 8: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/8.jpg)
D-TREMOR - AAMAS2011 8
D-TREMOR: Task Allocation
• Assign “tasks” using decentralized auction– Greedy, nearest allocation
• Create local, independent sub-problem:
![Page 9: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/9.jpg)
D-TREMOR - AAMAS2011 9
D-TREMOR: Local Planning
• Solve using off-the-shelf algorithm (EVA)
• Result: locally-optimal policies
![Page 10: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/10.jpg)
D-TREMOR - AAMAS2011 10
D-TREMOR: Interaction Exchange
Find PrCLi and ValCLi:
• Send CL messages to teammates:
No collision
Collision
+1
-6
ValCLi= -7
[Kearns 2002]
Entered corridor in 95 of 100 runs:
PrCLi= 0.95
![Page 11: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/11.jpg)
D-TREMOR - AAMAS2011 11
D-TREMOR: Model Shaping
• Shape local model rewards/transitions based on interactions
11
Probability of interaction
Interactionmodel functions
Independentmodel functions
![Page 12: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/12.jpg)
D-TREMOR - AAMAS2011 12
D-TREMOR: Local Planning (again)
• Re-solve shaped local models to get new policies
• Result: new locally-optimal policies new interactions
12
![Page 13: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/13.jpg)
D-TREMOR - AAMAS2011 13
D-TREMOR: Adv. Model Shaping
• In practice, we run into three common issues faced by concurrent optimization algorithms:– Slow convergence– Oscillation– Local optima
• We can alter our model-shaping to mitigate these by reasoning about the types of interactions we have
![Page 14: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/14.jpg)
D-TREMOR - AAMAS2011 14
D-TREMOR: Adv. Model Shaping
• Slow convergence Prioritization– Assign priorities to agents, only model-shape collision
interactions for higher priority agents
– Can quickly resolve purely negative interactions• Negative interaction: when every agent is guaranteed to have a
lower-valued local policy if an interaction occurs
![Page 15: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/15.jpg)
D-TREMOR - AAMAS2011 15
D-TREMOR: Adv. Model Shaping
• Oscillation Probabilistic shaping– Often caused by time dynamics between agents
• Agent 1 shapes based on Agent 2’s old policy• Agent 2 shapes based on Agent 1’s old policy
– Each agent only applies model-shaping with probability δ [Zhang 2005]
– Breaks out of cycles between agent policies
![Page 16: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/16.jpg)
D-TREMOR - AAMAS2011 16
D-TREMOR: Adv. Model Shaping
• Local Optima Optimistic initialization– Agents cannot detect mixed interactions (e.g. debris)
• Rescue agent policies can only improve if debris is cleared• Cleaner agent policies can only worsen if they clear debris
I’m not going near the
debrisIf no one is going through debris, I
won’t clear it
I’m not clearing the
debris
![Page 17: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/17.jpg)
D-TREMOR - AAMAS2011 17
D-TREMOR: Adv. Model Shaping
• Local Optima Optimistic initialization– Agents cannot detect mixed interactions (e.g. debris)
• Rescue agent policies can only improve if debris is cleared• Cleaner agent policies can only worsen if they clear debris
– Let each agent solve an initial model that uses an optimistic assumption of interaction condition
![Page 18: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/18.jpg)
D-TREMOR - AAMAS2011 18
Experimental Setup
• D-TREMOR policies– Max-joint-value– Last iteration
• Comparison policies– Independent– Optimistic– Do-nothing– Random
• Scaling:– 10 to 100 agents– Random maps
• Density– 100 agents– Concentric ring maps
• 3 problems/condition• 20 planning iterations• 7 time step horizon• 1 CPU per agent
D-TREMOR produces reasonable policies for 100-agent planning problems in under 6 hrs.
(with some caveats)
![Page 19: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/19.jpg)
D-TREMOR - AAMAS2011 19
Experimental Datasets
Scaling Dataset Density Dataset
![Page 20: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/20.jpg)
D-TREMOR - AAMAS2011 20
Experimental Results: Scaling
Naïve Policies
D-TREMOR Policies
![Page 21: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/21.jpg)
D-TREMOR - AAMAS2011 21
Experimental Results: Density
D-TREMOR rescues the most victims D-TREMOR does not
resolve every collision+10 ea. -5 ea.
![Page 22: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/22.jpg)
D-TREMOR - AAMAS2011 22
Experimental Results: Time
Increase in time related to # of CLs, not # of agents
# of
CLs
Acti
ve
![Page 23: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/23.jpg)
D-TREMOR - AAMAS2011 23
Conclusions
• D-TREMOR: Decentralized planning for sparse Dec-POMDPs with many agents
• Demonstrated complete distributability, fast heuristic interaction detection, and local message exchange to achieve high scalability
• Empirical results in simulated search and rescue domain
![Page 24: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/24.jpg)
D-TREMOR - AAMAS2011 24
Future Work
• Generalized framework for distributed planning under uncertainty through iterative message exchange
• Optimality/convergence bounds
• Reduce necessary communication• Better search over task allocations• Scaling to larger team sizes
![Page 25: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/25.jpg)
D-TREMOR - AAMAS2011 25
Questions?
![Page 26: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/26.jpg)
D-TREMOR - AAMAS2011 26
![Page 27: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/27.jpg)
D-TREMOR - AAMAS2011 27
Motivation
• Scaling planning to large teams is hard– Need to plan (with uncertainty) for each agent in team– Agents must consider the actions of a growing number of
teammates– Full, joint problem has NEXP complexity [Bernstein 2002]
• Optimality is going to be infeasible• Find and exploit structure in the problem• Make good plans in reasonable amount of time
![Page 28: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/28.jpg)
D-TREMOR - AAMAS2011 28
Motivation
• Exploit three characteristics of these domains1. Explicit Interactions
• Specific combinations of states and actions where effects depend on more than one agent
2. Sparsity of Interactions• Many potential interactions could occur between agents• Only a few will occur in any given solution
3. Distributed Computation• Each agent has access to local computation• A centralized algorithm has access to 1 unit of computation• A distributed algorithm has access to N units of computation
![Page 29: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/29.jpg)
D-TREMOR - AAMAS2011 29
Experimental Results: Density
Do-nothing does the best?
Ignoring interactions = poor performance
![Page 30: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/30.jpg)
D-TREMOR - AAMAS2011 30
Experimental Results: Time
Why is this increasing?
![Page 31: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/31.jpg)
D-TREMOR - AAMAS2011 31
Related WorkSc
alab
ility
Opti
mal
ity Generality EDI-CRTD-Dec-POMDP
TREMOR
Dynamic Networks
JESP
Prioritized Planning
D-TREMOR
DPC
OC-Dec-MDP
SPIDER
Structured Dec-(PO)MDP planners– JESP
[Nair 2003]
– TD-Dec-POMDP[Witwicki 2010]
– EDI-CR[Mostafa 2009]
– SPIDER[Marecki 2009]
• Restrict generality slightly to get scalability
• High optimalityOptimal Decoupling
![Page 32: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/32.jpg)
D-TREMOR - AAMAS2011 32
Related WorkSc
alab
ility
Opti
mal
ity Generality EDI-CRTD-Dec-POMDP
TREMOR
Dynamic Networks
JESP
Prioritized Planning
D-TREMOR
DPC
OC-Dec-MDP
SPIDER
Heuristic Dec-(PO)MDP planners– TREMOR
[Varakantham 2009]
– OC-Dec-MDP[Beynier 2005]
• Sacrifice optimality for scalability
• High generality
Optimal Decoupling
![Page 33: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/33.jpg)
D-TREMOR - AAMAS2011 33
Related WorkSc
alab
ility
Opti
mal
ity Generality EDI-CRTD-Dec-POMDP
TREMOR
Dynamic Networks
JESP
Prioritized Planning
D-TREMOR
DPC
OC-Dec-MDP
SPIDER
Optimal Decoupling
Structured multiagent path planners– DPC
[Bhattacharya 2010]
– Optimal Decoupling[Van den Berg 2009]
• Sacrifice generality further to get scalability
• High optimality
![Page 34: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/34.jpg)
D-TREMOR - AAMAS2011 34
Related WorkSc
alab
ility
Opti
mal
ity Generality EDI-CRTD-Dec-POMDP
TREMOR
Dynamic Networks
JESP
Prioritized Planning
D-TREMOR
DPC
OC-Dec-MDP
SPIDER
Optimal Decoupling
Heuristic multiagent path planners– Dynamic Networks
[Clark 2003]
– Prioritized Planning[Van den Berg 2005]
• Sacrifice optimality to get scalability
![Page 35: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/35.jpg)
D-TREMOR - AAMAS2011 35
Scal
abili
ty
Opti
mal
ity Generality
Related Work
EDI-CRTD-Dec-POMDP
TREMOR
Dynamic Networks
JESP
Our approach:
• Fix high scalability and generality
• Explore what level of optimality is possible
Prioritized Planning
D-TREMOR
DPC
OC-Dec-MDP
SPIDER
Optimal Decoupling
![Page 36: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/36.jpg)
D-TREMOR - AAMAS2011 36
A Simple Rescue Domain
Rescue Agent
Cleaner Agent
Narrow Corridor
Victim
Unsafe Cell
Clearable Debris
![Page 37: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/37.jpg)
D-TREMOR - AAMAS2011 37
A Simple (Large) Rescue Domain
![Page 38: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/38.jpg)
D-TREMOR - AAMAS2011 38
Distributed POMDP with Coordination Locales (DPCL)
• Often, interactions between agents are sparse
Only fits one agent Passable if
cleaned
[Varakantham, et al 2009]
![Page 39: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/39.jpg)
D-TREMOR - AAMAS2011 39
Distributed, Iterative Planning
• Inspiration:– TREMOR
[Varankantham 2009]
– JESP[Nair 2003]
• Reduce the full joint problem into a set of smaller, independent sub-problems
• Solve independent sub-problems with local algorithm
• Modify sub-problems to push locally optimal solutions towards high-quality joint solution
![Page 40: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/40.jpg)
D-TREMOR - AAMAS2011 40
Distributed Team REshaping of MOdels for Rapid execution (D-
TREMOR)
• Reduce the full joint problem into a set of smaller, independent sub-problems (one for each agent)
• Solve independent sub-problems with existing state-of-the-art algorithms
• Modify sub-problems such that local optimum solution approaches high-quality joint solution
Task Allocation
Local Planning
Interaction Exchange
Model Shaping
![Page 41: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/41.jpg)
D-TREMOR - AAMAS2011 41
Decentralized auction
EVA POMDP solver
Policy sub-sampling and Coordination Locale (CL)
messages
Prioritized/randomized reward and transition shaping
D-TREMOR (extending [Varakantham, et al 2009])
Task Allocation
Local Planning
Interaction Exchange
Model Shaping
![Page 42: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/42.jpg)
D-TREMOR - AAMAS2011 42
D-TREMOR: Task Allocation
• Assign “tasks” using decentralized auction– Greedy, nearest allocation
• Create local, independent sub-problem:
![Page 43: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/43.jpg)
D-TREMOR - AAMAS2011 43
D-TREMOR: Local Planning
• Solve using off-the-shelf algorithm (EVA)
• Result: locally-optimal policies
![Page 44: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/44.jpg)
D-TREMOR - AAMAS2011 44
D-TREMOR: Interaction Exchange
Finding PrCLi
• Evaluate local policy
• Compute frequency of associated si, ai
[Kearns 2002]:
Entered corridor in 95 of 100 runs:
PrCLi= 0.95
![Page 45: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/45.jpg)
D-TREMOR - AAMAS2011 45
D-TREMOR: Interaction Exchange
Finding ValCLi
• Sample local policy value with/without interactions– Test interactions independently
• Compute change in value if interaction occurred
No collision
Collision
+1
-6
ValCLi= -7
[Kearns 2002]:
![Page 46: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/46.jpg)
D-TREMOR - AAMAS2011 46
D-TREMOR: Interaction Exchange
• Send CL messages to teammates:
• Sparsity Relatively small # of messages
![Page 47: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/47.jpg)
D-TREMOR - AAMAS2011 47
D-TREMOR: Model Shaping
• Shape local model rewards/transitions based on remote interactions
47
Probability of interaction
Interactionmodel functions
Independentmodel functions
![Page 48: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/48.jpg)
D-TREMOR - AAMAS2011 48
D-TREMOR: Local Planning (again)
• Re-solve shaped local models to get new policies
• Result: new locally-optimal policies new interactions
48
![Page 49: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/49.jpg)
D-TREMOR - AAMAS2011 49
D-TREMOR: Adv. Model Shaping
• In practice, we run into three common issues faced by concurrent optimization algorithms:– Slow convergence– Oscillation– Local optima
• We can alter our model-shaping to mitigate these by reasoning about the types of interactions we have
![Page 50: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/50.jpg)
D-TREMOR - AAMAS2011 50
D-TREMOR: Adv. Model Shaping
• Slow convergence Prioritization– Majority of interactions are collisions
– Assign priorities to agents, only model-shape collision interactions for higher priority agents
– From DPP: prioritization can quickly resolve collision interactions
– Similar properties for any purely negative interaction• Negative interaction: when every agent is guaranteed to have a
lower-valued local policy if an interaction occurs
![Page 51: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/51.jpg)
D-TREMOR - AAMAS2011 51
D-TREMOR: Adv. Model Shaping
• Oscillation Probabilistic shaping– Often caused by time dynamics between agents
• Agent 1 shapes based on Agent 2’s old policy• Agent 2 shapes based on Agent 1’s old policy
– Each agent only applies model-shaping with probability δ [Zhang 2005]
– Breaks out of cycles between agent policies
![Page 52: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/52.jpg)
D-TREMOR - AAMAS2011 52
D-TREMOR: Adv. Model Shaping
• Local Optima Optimistic initialization– Agents cannot detect mixed interactions (e.g. debris)
• Rescue agent policies can only improve if debris is cleared• Cleaner agent policies can only worsen if they clear debris
PrCL = low, ValCL = lowIf (ValCL = low):
optimal policy do nothing
PrCL = low, ValCL = low
![Page 53: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/53.jpg)
D-TREMOR - AAMAS2011 53
D-TREMOR: Adv. Model Shaping
• Local Optima Optimistic initialization– Agents cannot detect mixed interactions (e.g. debris)
• Rescue agent policies can only improve if debris is cleared• Cleaner agent policies can only worsen if they clear debris
– Let each agent solve an initial model that uses an optimistic assumption of interaction condition
![Page 54: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/54.jpg)
D-TREMOR - AAMAS2011 54
Experimental Setup
• D-TREMOR policies– Max-joint-value– Last iteration
• Comparison policies– Independent– Optimistic– Do-nothing– Random
• Scaling:– 10 to 100 agents– Random maps
• Density– 100 agents– Concentric ring maps
• 3 problems/condition• 20 planning iterations• 7 time step horizon• 1 CPU per agent
D-TREMOR produces reasonable policies for 100-agent planning problems in under 6 hrs.
(with some caveats)
![Page 55: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/55.jpg)
D-TREMOR - AAMAS2011 55
Experimental Datasets
Scaling Dataset Density Dataset
![Page 56: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/56.jpg)
D-TREMOR - AAMAS2011 56
Experimental Results: Scaling
Naïve Policies
D-TREMOR Policies
![Page 57: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/57.jpg)
D-TREMOR - AAMAS2011 57
Experimental Results: Density
Do-nothing does the best?
Ignoring interactions = poor performance
![Page 58: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/58.jpg)
D-TREMOR - AAMAS2011 58
Experimental Results: Density
D-TREMOR rescues the most victims D-TREMOR does not
resolve every collision+10 ea. -5 ea.
![Page 59: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/59.jpg)
D-TREMOR - AAMAS2011 59
Experimental Results: Time
Why is this increasing?
![Page 60: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/60.jpg)
D-TREMOR - AAMAS2011 60
Experimental Results: Time
Increase in time related to # of CLs, not # of agents
![Page 61: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/61.jpg)
D-TREMOR - AAMAS2011 61
Conclusions
• D-TREMOR: Decentralized planning for sparse Dec-POMDPs with many agents
• Demonstrated complete distributability, fast heuristic interaction detection, and local message exchange to achieve high scalability
• Empirical results in simulated search and rescue domain
![Page 62: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/62.jpg)
D-TREMOR - AAMAS2011 62
Conclusions
D-TREMOR produces reasonable policies for 100-agent planning problems in under 6 hrs.– Partially-observable, uncertain world– Multiple types of interactions & agents
• Improves over independent planning• Resolved interactions in large problems• Still some convergence/efficiency issues
![Page 63: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/63.jpg)
D-TREMOR - AAMAS2011 63
DPCL vs. other models
• EDI/EDI-CR– Adds complex transition functions
• TD-Dec-MDP– Allows simultaneous interaction (within epoch)
• Factored MDP/POMDP– Adds interactions that span epochs
![Page 64: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/64.jpg)
D-TREMOR - AAMAS2011 64
D-TREMOR
![Page 65: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/65.jpg)
D-TREMOR - AAMAS2011 65
D-TREMOR
![Page 66: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/66.jpg)
D-TREMOR - AAMAS2011 66
D-TREMOR: Reward functions
• Probability that a debris will not allow a robot to enter the cell: – P_Debris = 0.9;
• Probability of action failure– P_ActionFailure = 0.2;
• Probability that success is observed if the action succeeded.– P_ObsSuccessOnSuccess = 0.8;
• Probability that success is observed if the action failed– P_ObsSuccessOnFailure = 0.2;
• Probability that a robot will return to the same cell after collision– P_ReboundAfterCollision = 0.5;
• Reward of saving a victim– R_Victim = 10.0;
• Reward of cleaning debris– R_Cleaning = 0.25;
• Reward of moving– R_Move = -0.5;
• Reward of observing– R_Observe = -0.25;
• Reward for a collision– R_Collision = -5.0;
• Reward for landing in an unsafe cell– R_Unsafe = -1;
![Page 67: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/67.jpg)
D-TREMOR - AAMAS2011 67
Review: POMDP
+100
-10
: Set of States
: Set of Actions
: Set of Observations
: Transition function
: Reward function
: Observation function
![Page 68: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/68.jpg)
D-TREMOR - AAMAS2011 68
Distributed POMDP with Coordination Locales
[Varakantham, et al 2009]• Extension of Dec-POMDP which modifies ,• Coordination locales (CLs) represent interactions:
Explicit time
Explicit time constraint
Implicitly construct interaction functions
CL =
![Page 69: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/69.jpg)
D-TREMOR - AAMAS2011 69
Proposed Approach: DIMSDistributed Iterative Model Shaping
Task Allocation
Local Planning
Interaction Exchange
Model Shaping
• Assign tasks to agents• Reduce search space considered by agent• Define local sub-problem for each robot
![Page 70: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/70.jpg)
D-TREMOR - AAMAS2011 70
Proposed Approach: DIMSDistributed Iterative Model Shaping
Task Allocation
Local Planning
Interaction Exchange
Model Shaping
• Assign tasks to agents• Reduce search space considered by agent• Define local sub-problem for each robot
Full SI-Dec-POMDP
Local (Independent) POMDP
![Page 71: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/71.jpg)
D-TREMOR - AAMAS2011 71
Proposed Approach: DIMSDistributed Iterative Model Shaping
Task Allocation
Local Planning
Interaction Exchange
Model Shaping
• Solve local sub-problems using off-the-shelf centralized solver
• Result: Locally-optimal policy
![Page 72: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/72.jpg)
D-TREMOR - AAMAS2011 72
Proposed Approach: DIMSDistributed Iterative Model Shaping
Task Allocation
Local Planning
Interaction Exchange
Model Shaping
• Given local policy: estimate local probability and value of interactions
• Communicate local probability and value of relevant interactions to team members
• Sparsity Relatively small # of messages
![Page 73: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/73.jpg)
D-TREMOR - AAMAS2011 73
Proposed Approach: DIMSDistributed Iterative Model Shaping
Task Allocation
Local Planning
Interaction Exchange
Model Shaping
• Modify local sub-problems to account for presence of interactions
![Page 74: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/74.jpg)
D-TREMOR - AAMAS2011 74
Proposed Approach: DIMSDistributed Iterative Model Shaping
Task Allocation
Local Planning
Interaction Exchange
Model Shaping
• Reallocate tasks or re-plan using modified local sub-problem
![Page 75: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/75.jpg)
D-TREMOR - AAMAS2011 75
Any decentralized allocation mechanism (e.g. auctions)
Stock graph, MDP, POMDP solver
Lightweight local evaluation and low-bandwidth messaging
Methods to alter local problem to incorporate non-local effects
Proposed Approach: DIMSDistributed Iterative Model Shaping
Task Allocation
Local Planning
Interaction Exchange
Model Shaping
![Page 76: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/76.jpg)
D-TREMOR - AAMAS2011 76
Example: Interactions
Rescue robot
Cleaner robot
Debris
Victim
![Page 77: Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents](https://reader035.vdocuments.site/reader035/viewer/2022062518/5681400c550346895dab4611/html5/thumbnails/77.jpg)
D-TREMOR - AAMAS2011 77
Example: Sparsity