approximate solutions for factored dec-pomdps with many...

33
May 8, 2013 1 / 33 Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents Approximate Solutions for Factored Dec-POMDPs with Many Agents Frans A. Oliehoek, Shimon Whiteson, & Matthijs T.J. Spaan AAMAS, Wednesday May 8, 2013

Upload: others

Post on 23-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Approximate Solutions for Factored Dec-POMDPs with Many …people.csail.mit.edu/fao/docs/Oliehoek13AAMAS_pres.pdfFSPC + TP + QBG performs optimallyFact. FSPC + TP + QBG performs optimally

May 8, 2013 1 / 33Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents

Approximate Solutions for FactoredDec-POMDPs with Many Agents

Frans A. Oliehoek, Shimon Whiteson, & Matthijs T.J. Spaan

AAMAS, Wednesday May 8, 2013

Page 2: Approximate Solutions for Factored Dec-POMDPs with Many …people.csail.mit.edu/fao/docs/Oliehoek13AAMAS_pres.pdfFSPC + TP + QBG performs optimallyFact. FSPC + TP + QBG performs optimally

May 8, 2013 2 / 33Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents

Visual Roadmap

resultsresults

modelmodelbackground: FSPCbackground: FSPC

Factored FSPCFactored FSPC

practical factored FFSPCpractical factored FFSPC

Page 3: Approximate Solutions for Factored Dec-POMDPs with Many …people.csail.mit.edu/fao/docs/Oliehoek13AAMAS_pres.pdfFSPC + TP + QBG performs optimallyFact. FSPC + TP + QBG performs optimally

May 8, 2013 3 / 33Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents

Multiagent decision making under Uncertainty

Outcome Uncertainty

Partial Observability

Multiagent Systems: uncertainty about others

Page 4: Approximate Solutions for Factored Dec-POMDPs with Many …people.csail.mit.edu/fao/docs/Oliehoek13AAMAS_pres.pdfFSPC + TP + QBG performs optimallyFact. FSPC + TP + QBG performs optimally

May 8, 2013 4 / 33Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents

Formal Model A Dec-POMDP

n agents S – set of states A – set of joint actions PT – transition function

O – set of joint observations PO – observation function

R – reward function h – horizon (finite)

⟨S , A , PT ,O , PO , R ,h⟩

a=⟨a1,a2, ... ,an⟩

o=⟨o1,o2, ... , on⟩

P(s '∣s , a)

P(o∣a , s ')

R (s , a)=E s ' R(s ,a , s ' )

Page 5: Approximate Solutions for Factored Dec-POMDPs with Many …people.csail.mit.edu/fao/docs/Oliehoek13AAMAS_pres.pdfFSPC + TP + QBG performs optimallyFact. FSPC + TP + QBG performs optimally

May 8, 2013 5 / 33Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents

Factored Dec-POMDPs

1 2 3 4

h4

h1

h2

h3

t

state variables or 'factors'

E.g., 'Firefighting Graph' H houses h1...hH

H-1 agents actions: left or right observations: flames or not

Page 6: Approximate Solutions for Factored Dec-POMDPs with Many …people.csail.mit.edu/fao/docs/Oliehoek13AAMAS_pres.pdfFSPC + TP + QBG performs optimallyFact. FSPC + TP + QBG performs optimally

May 8, 2013 6 / 33Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents

Factored Dec-POMDPs

h4

h1

h2

h3

h1'

h2'

h3'

h4'

t t+1

CPTs(for each factor)

1 2 3 4

E.g., 'Firefighting Graph' H houses h1...hH

H-1 agents actions: left or right observations: flames or not

Page 7: Approximate Solutions for Factored Dec-POMDPs with Many …people.csail.mit.edu/fao/docs/Oliehoek13AAMAS_pres.pdfFSPC + TP + QBG performs optimallyFact. FSPC + TP + QBG performs optimally

May 8, 2013 7 / 33Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents

Factored Dec-POMDPs

h4

h1

h2

h3

a1

h1'

h2'

h3'

h4'

a2

a3

t t+1

CPTs(for each factor)

1 2 3 4

E.g., 'Firefighting Graph' H houses h1...hH

H-1 agents actions: left or right observations: flames or not

Page 8: Approximate Solutions for Factored Dec-POMDPs with Many …people.csail.mit.edu/fao/docs/Oliehoek13AAMAS_pres.pdfFSPC + TP + QBG performs optimallyFact. FSPC + TP + QBG performs optimally

May 8, 2013 8 / 33Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents

Factored Dec-POMDPs

h4

h1

h2

h3

a1

h1'

h2'

h3'

h4'

o1

a2o2

a3o3

t t+1

CPTs(for each factor)

1 2 3 4

E.g., 'Firefighting Graph' H houses h1...hH

H-1 agents actions: left or right observations: flames or not

Page 9: Approximate Solutions for Factored Dec-POMDPs with Many …people.csail.mit.edu/fao/docs/Oliehoek13AAMAS_pres.pdfFSPC + TP + QBG performs optimallyFact. FSPC + TP + QBG performs optimally

May 8, 2013 9 / 33Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents

Factored Dec-POMDPs

h4

h1

h2

h3

a1

h1'

h2'

h3'

h4'

o1

a2o2

a3o3

t t+1

reward for house 1

R1

1 2 3 4

E.g., 'Firefighting Graph' H houses h1...hH

H-1 agents actions: left or right observations: flames or not

Page 10: Approximate Solutions for Factored Dec-POMDPs with Many …people.csail.mit.edu/fao/docs/Oliehoek13AAMAS_pres.pdfFSPC + TP + QBG performs optimallyFact. FSPC + TP + QBG performs optimally

May 8, 2013 10 / 33Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents

Factored Dec-POMDPs

h4

h1

h2

h3

a1

h1'

h2'

h3'

h4'

o1

a2o2

a3o3

t t+1

R1

Expected reward for house 1

1 2 3 4

E.g., 'Firefighting Graph' H houses h1...hH

H-1 agents actions: left or right observations: flames or not

Page 11: Approximate Solutions for Factored Dec-POMDPs with Many …people.csail.mit.edu/fao/docs/Oliehoek13AAMAS_pres.pdfFSPC + TP + QBG performs optimallyFact. FSPC + TP + QBG performs optimally

May 8, 2013 11 / 33Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents

Background: FSPC – 1

Goal: scalability w.r.t. #agents Forward-sweep policy computation (FSPC)

for each t=0,...,h-1 compute best joint decision rule δt

given past joint policy φt

Page 12: Approximate Solutions for Factored Dec-POMDPs with Many …people.csail.mit.edu/fao/docs/Oliehoek13AAMAS_pres.pdfFSPC + TP + QBG performs optimallyFact. FSPC + TP + QBG performs optimally

May 8, 2013 12 / 33Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents

Background: FSPC – 1

Goal: scalability w.r.t. #agents Forward-sweep policy computation (FSPC)

for each t=0,...,h-1 compute best joint decision rule δt

given past joint policy φt

φ0

( )

Page 13: Approximate Solutions for Factored Dec-POMDPs with Many …people.csail.mit.edu/fao/docs/Oliehoek13AAMAS_pres.pdfFSPC + TP + QBG performs optimallyFact. FSPC + TP + QBG performs optimally

May 8, 2013 13 / 33Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents

Background: FSPC – 1

Goal: scalability w.r.t. #agents Forward-sweep policy computation (FSPC)

for each t=0,...,h-1 compute best joint decision rule δt

given past joint policy φt

φ0

δ0

( )

Page 14: Approximate Solutions for Factored Dec-POMDPs with Many …people.csail.mit.edu/fao/docs/Oliehoek13AAMAS_pres.pdfFSPC + TP + QBG performs optimallyFact. FSPC + TP + QBG performs optimally

May 8, 2013 14 / 33Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents

Background: FSPC – 1

Goal: scalability w.r.t. #agents Forward-sweep policy computation (FSPC)

for each t=0,...,h-1 compute best joint decision rule δt

given past joint policy φt

φ1

φ0

( )

(δ0)

Page 15: Approximate Solutions for Factored Dec-POMDPs with Many …people.csail.mit.edu/fao/docs/Oliehoek13AAMAS_pres.pdfFSPC + TP + QBG performs optimallyFact. FSPC + TP + QBG performs optimally

May 8, 2013 15 / 33Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents

Background: FSPC – 1

Goal: scalability w.r.t. #agents Forward-sweep policy computation (FSPC)

for each t=0,...,h-1 compute best joint decision rule δt

given past joint policy φt

φ1

φ0

( )

(δ0)

δ1

Page 16: Approximate Solutions for Factored Dec-POMDPs with Many …people.csail.mit.edu/fao/docs/Oliehoek13AAMAS_pres.pdfFSPC + TP + QBG performs optimallyFact. FSPC + TP + QBG performs optimally

May 8, 2013 16 / 33Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents

Background: FSPC – 1

Goal: scalability w.r.t. #agents Forward-sweep policy computation (FSPC)

for each t=0,...,h-1 compute best joint decision rule δt

given past joint policy φt

φ1

φ2

φ0

( )

(δ0)

(δ0,δ1)

Page 17: Approximate Solutions for Factored Dec-POMDPs with Many …people.csail.mit.edu/fao/docs/Oliehoek13AAMAS_pres.pdfFSPC + TP + QBG performs optimallyFact. FSPC + TP + QBG performs optimally

May 8, 2013 17 / 33Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents

Background: FSPC – 2

As collaborative Bayesian games (CBGs) agents, actions types θi histories↔

probabilities: P(θ) payoffs: Q(θ,a)

δit(θ⃗i

t)=ait

B(φ1)

B(φ2)

B(φ0)

φ1

φ2

φ0

Problem at stage t select δ=<δ1,...,δn>

δi : histories actions→

Page 18: Approximate Solutions for Factored Dec-POMDPs with Many …people.csail.mit.edu/fao/docs/Oliehoek13AAMAS_pres.pdfFSPC + TP + QBG performs optimallyFact. FSPC + TP + QBG performs optimally

May 8, 2013 18 / 33Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents

Background: CBGs

As collaborative Bayesian games (CBGs) agents, actions types θi histories↔

probabilities: P(θ) payoffs: Q(θ,a)

B(φ1)

B(φ2)

B(φ0)

H2 H3

... ...

... ...

H2 H3

H1 +2 -1

H2 ... ...

... ...

... ...

H1 … ...

H2 ... ...

No FlamesFlames

No Flames

Flames

ag. 2

agent 1

1 2 3

Page 19: Approximate Solutions for Factored Dec-POMDPs with Many …people.csail.mit.edu/fao/docs/Oliehoek13AAMAS_pres.pdfFSPC + TP + QBG performs optimallyFact. FSPC + TP + QBG performs optimally

May 8, 2013 19 / 33Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents

This paper: Factored FSPC

if value function factored

→ replace CBGs with Collaborative graphical Bayesian games (CGBGs)

Q( x , θ⃗ , a)=∑eQe(x e , θ⃗e , ae)

Factored FSPC – Basic idea:

exploit independence between agents in FSPC

Factored FSPC – Basic idea:

exploit independence between agents in FSPC

Page 20: Approximate Solutions for Factored Dec-POMDPs with Many …people.csail.mit.edu/fao/docs/Oliehoek13AAMAS_pres.pdfFSPC + TP + QBG performs optimallyFact. FSPC + TP + QBG performs optimally

May 8, 2013 20 / 33Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents

if value function factored

→ replace CBGs with Collaborative graphical Bayesian games (CGBGs)

This paper: Factored FSPC

agent 1 agent 3

agent 2

Total payoff is sum of components

Page 21: Approximate Solutions for Factored Dec-POMDPs with Many …people.csail.mit.edu/fao/docs/Oliehoek13AAMAS_pres.pdfFSPC + TP + QBG performs optimallyFact. FSPC + TP + QBG performs optimally

May 8, 2013 21 / 33Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents

Factored FSPC for Many Agents

Improving scalability w.r.t. the number of agents...

1) Approximate structure of Q* Predetermined scope structure

2) Compute CGBG payoff functions Transfer Planning (TP)

3) Inference techniques to construct CGBGs Extension of factored frontier [Murphy&Weiss UAI 2001]

4) Efficient solutions of CGBGs Max-plus to ATI-FG [OWS UAI 2012]

B(φ1)

B(φ2)

B(φ0)

Page 22: Approximate Solutions for Factored Dec-POMDPs with Many …people.csail.mit.edu/fao/docs/Oliehoek13AAMAS_pres.pdfFSPC + TP + QBG performs optimallyFact. FSPC + TP + QBG performs optimally

May 8, 2013 22 / 33Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents

Factored FSPC for Many Agents

Improving scalability w.r.t. the number of agents...

1) Approximate structure of Q* Predetermined scope structure

2) Compute CGBG payoff functions Transfer Planning (TP)

3) Inference techniques to construct CGBGs Extension of factored frontier [Murphy&Weiss UAI 2001]

4) Efficient solutions of CGBGs Max-plus to ATI-FG [OWS UAI 2012]

B(φ1)

B(φ2)

B(φ0)

Rest of this talk

see paper

Page 23: Approximate Solutions for Factored Dec-POMDPs with Many …people.csail.mit.edu/fao/docs/Oliehoek13AAMAS_pres.pdfFSPC + TP + QBG performs optimallyFact. FSPC + TP + QBG performs optimally

May 8, 2013 23 / 33Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents

Accumulation of Reward over Time

a1

h1

h2

h3

h4

a2

a3

a1

h1

h2

h3

h4

o1

o2

o3

a2

a3

a1

h1

h2

h3

h4

o1

o2

o3

a2

a3

t=0 t=1 t=2

R1Qe=1(xe , θ⃗e , ae)

Page 24: Approximate Solutions for Factored Dec-POMDPs with Many …people.csail.mit.edu/fao/docs/Oliehoek13AAMAS_pres.pdfFSPC + TP + QBG performs optimallyFact. FSPC + TP + QBG performs optimally

May 8, 2013 24 / 33Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents

Accumulation of Reward over Time

a1

h1

h2

h3

h4

a2

a3

a1

h1

h2

h3

h4

o1

o2

o3

a2

a3

a1

h1

h2

h3

h4

o1

o2

o3

a2

a3

t=0 t=1 t=2

R1

Page 25: Approximate Solutions for Factored Dec-POMDPs with Many …people.csail.mit.edu/fao/docs/Oliehoek13AAMAS_pres.pdfFSPC + TP + QBG performs optimallyFact. FSPC + TP + QBG performs optimally

May 8, 2013 25 / 33Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents

Accumulation of Reward over Time

a1

h1

h2

h3

h4

a2

a3

a1

h1

h2

h3

h4

o1

o2

o3

a2

a3

a1

h1

h2

h3

h4

o1

o2

o3

a2

a3

t=0 t=1 t=2

R1

Q1

Still factored, but components not local!Still factored, but components not local!

Page 26: Approximate Solutions for Factored Dec-POMDPs with Many …people.csail.mit.edu/fao/docs/Oliehoek13AAMAS_pres.pdfFSPC + TP + QBG performs optimallyFact. FSPC + TP + QBG performs optimally

May 8, 2013 26 / 33Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents

1) Predetermined Scope Structure Use smaller scopes

a1

h1

h2

h3

h4

a2

a3

a1

h1

h2

h3

h4

o1

o2

o3

a2

a3

a1

h1

h2

h3

h4

o1

o2

o3

a2

a3

t=0 t=1 t=2

Q1

Page 27: Approximate Solutions for Factored Dec-POMDPs with Many …people.csail.mit.edu/fao/docs/Oliehoek13AAMAS_pres.pdfFSPC + TP + QBG performs optimallyFact. FSPC + TP + QBG performs optimally

May 8, 2013 27 / 33Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents

a1

h1

h2

h3

h4

a2

a3

a1

h1

h2

h3

h4

o1

o2

o3

a2

a3

a1

h1

h2

h3

h4

o1

o2

o3

a2

a3

t=0 t=1 t=2

Q1

Q3

Q2

Q4

1) Predetermined Scope Structure Use smaller scopes

Page 28: Approximate Solutions for Factored Dec-POMDPs with Many …people.csail.mit.edu/fao/docs/Oliehoek13AAMAS_pres.pdfFSPC + TP + QBG performs optimallyFact. FSPC + TP + QBG performs optimally

May 8, 2013 28 / 33Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents

Define a source problem for each component involving fewer agents

Solve those exactly or approximately (QMDP, etc.)

2) Transfer Planning

a1

h1

h2

h3

h4

a2

a3

a1

h1

h2

h3

h4

o1

o2

o3

a2

a3

a1

h1

h2

h3

h4

o1

o2

o3

a2

a3

t=0 t=1 t=2

Q3

Q2

1 2 3

2 3 4

Page 29: Approximate Solutions for Factored Dec-POMDPs with Many …people.csail.mit.edu/fao/docs/Oliehoek13AAMAS_pres.pdfFSPC + TP + QBG performs optimallyFact. FSPC + TP + QBG performs optimally

May 8, 2013 29 / 33Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents

Results – Compared to Optimal

1 2 3 4

Fact. FSPC + TP + QBG performs optimallyFact. FSPC + TP + QBG performs optimally

Page 30: Approximate Solutions for Factored Dec-POMDPs with Many …people.csail.mit.edu/fao/docs/Oliehoek13AAMAS_pres.pdfFSPC + TP + QBG performs optimallyFact. FSPC + TP + QBG performs optimally

May 8, 2013 30 / 33Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents

Results – vs. Approximate

FFSCP: TP vs ADP, NR Non-factored FSPC, DICE [Oliehoek et al. 2008 Informatica]

TP achieves (near-) best value....

TP achieves (near-) best value.... ...and superior

scaling behavior...and superior

scaling behavior

Page 31: Approximate Solutions for Factored Dec-POMDPs with Many …people.csail.mit.edu/fao/docs/Oliehoek13AAMAS_pres.pdfFSPC + TP + QBG performs optimallyFact. FSPC + TP + QBG performs optimally

May 8, 2013 31 / 33Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents

Results – Many Agents

Page 32: Approximate Solutions for Factored Dec-POMDPs with Many …people.csail.mit.edu/fao/docs/Oliehoek13AAMAS_pres.pdfFSPC + TP + QBG performs optimallyFact. FSPC + TP + QBG performs optimally

May 8, 2013 32 / 33Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents

Conclusions Factored FSPC with transfer planning:

approximates factored Dec-POMDPs with multiple abstractions involving subsets of agents

Unprecedented scalability for this class results up to 1000 agents

Future work: scale to higher horizon understand such abstractions

empirically verified near-optimal quality formal understanding influence-based abstraction [AAAI' 12]

Page 33: Approximate Solutions for Factored Dec-POMDPs with Many …people.csail.mit.edu/fao/docs/Oliehoek13AAMAS_pres.pdfFSPC + TP + QBG performs optimallyFact. FSPC + TP + QBG performs optimally

May 8, 2013 33 / 33Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents

References R. Emery-Montemerlo, G. Gordon, J. Schneider, and S. Thrun. Approximate solutions for

partially observable stochastic games with common payoffs. In AAMAS, 2004. K. P. Murphy and Y. Weiss. The factored frontier algorithm for approximate inference in

DBNs. In UAI, 2001. F. A. Oliehoek, J. F. Kooi, and N. Vlassis. The cross-entropy method for policy search in

decentralized POMDPs. Informatica, 32:341–357, 2008 F. A. Oliehoek, M. T. J. Spaan, S. Whiteson, and N. Vlassis. Exploiting locality of interaction in

factored Dec-POMDPs. In AAMAS, 2008. F. A. Oliehoek, S. Witwicki, and L. P. Kaelbling. Influence-Based Abstraction for Multiagent

Systems. In AAAI, 2012. F. A. Oliehoek, S. Whiteson, and M. T. J. Spaan. Exploiting structure in cooperative Bayesian

games. In UAI, 2012. D. Szer, F. Charpillet, and S. Zilberstein. MAA*: A heuristic search algorithm for solving

decentralized POMDPs. In UAI, 2005.