Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs
Rosemary Emery-Montemerlo
joint work with
Geoff Gordon, Jeff Schneider and Sebastian Thrun
July 21, 2004 AAMAS 2004
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo
Robot Teams
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo
Robot Teams
With limited communication, existing paradigms for decentralized robot control are not sufficient
Game theoretic methods are necessary for multi-robot coordination under these conditions
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo
Decentralized Decision Making
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo
Decentralized Decision Making
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo
Decentralized Decision Making
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo
Decentralized Decision Making
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo
Decentralized Decision Making
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo
Decentralized Decision Making
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo
Decentralized Decision Making
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo
Decentralized Decision Making
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo
Decentralized Decision Making
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo
Decentralized Decision Making
A robot cannot choose actions based only on joint observations consistent with its own sensor readings
It must consider all joint observations that are consistent with its possible sensor readings
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo
Relationship Between Decision Theoretic Models
State Space State Space Belief Space Belief Space
MDP POMDP ?
Distributionover
Belief Space
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo
Models of Multi-Agent Systems Partially observable stochastic
games Generalization of stochastic games to
partially observable worlds Related models
DEC-POMDP [Bernstein et al., 2000] MTDP [Pynadath and Tambe, 2002] I-POMDP [Gmystrasiewicz and Doshi, 2004] POIPSG [Peshkin et al., 2000]
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo
Partially Observable Stochastic Games
POSG = {I, S, A, Z, T, R, O} I is the set of agents, I= {1,…,n} S is the set of states A is the set of actions, A= A1 An Z is the set of observations, Z= Z1
Zn T is the transition function, T: S A S R is the reward function, R: S A O are the observation emission
probabilities O: S Z A [0,1]
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo
Solving POSGs
POSGs are computationally infeasible to solve
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo
Solving POSGs
Full POSG
One-StepLookaheadGame at time t(Bayesian Game)
We can approximate a POSG as a series of smaller Bayesian games
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo
Bayesian Games Private information relevant to game
Uncertainty in utility Type
Encapsulates private information Will limit selves to games with finite number
of types In robot example
Type 1: Robot doesn’t see anything Type 2: Robot sees intruder at location x
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo
Bayesian Games BG = {I, , A,p(), u}
is the joint type space, = 1 n is a specific joint type, = {1,…, n}
p() is common prior on the distribution over
u is the utility function, u= {u1,…,un} ui(ai,a-i,(i, -i))
i is a strategy for player i Defines what player i does for each of its
possible types Actions are individual actions, not joint
actions
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo
Bayesian-Nash Equilibrium
Set of best response strategies Each agent tries to maximize its
expected utility conditioned on its probability distribution over the other agents’ types p() Each agent has a policy i that, given
-i , maximizes ui(i,-i, -i)
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo
POSG to Bayesian Game Approximation {I,S,A,Z,T,R,O} to {I, , A,p(), u}t
I = I A = A Type space i
t = all possible histories of agent i’s actions and observations up to time t
p()t calculated from S0,A,T,Z,O, t-1
Prune low probability types Each joint type maps to a joint belief
u given by heuristic and ui = uj QMDP
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo
AlgorithmInitializet=0, hi = {},p(0)0=solveGame(0,p(0))
Make Observationhi = obsi
t U ait-1 U hi
Determine Typei
t = bestMatch(hi, i
t)
Execute Actionai
t = i
t (i
t )
Propagate Forwardt+1,p(t+1)
Find Policy for t+1t+1=solveGame(t,p(t
))t= t+1
Agent i
Initializet=0, hj = {},p(0)0=solveGame(0,p(0))
Make Observationhj = obsj
t U ajt-1 U hj
Determine Typej
t = bestMatch(hj, 2
t)
Execute Actionaj
t = j
t (j
t )
Propagate Forwardt+1,p(t+1)
Find Policy for t+1t+1=solveGame(t,p(t
))t= t+1
Agent j
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo
Robotic Team Tag Version of Team Tag
Environment is portion of Gates Hall Full teammate
observability Opponent can be
captured by a single robot in any state
QMDP used as heuristic
Two pioneer-class robots
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo
Robot Policies
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo
Lady And The Tiger [Nair et al. 2003]
Computation Time
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1000000
3 4 5 6 7 8 9 10
Horizion
Tim
e(m
s)
Full POSG
Bayesian GameApproximation
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo
Contributions Algorithm for finding approximate
solutions to POSG with common payoffs Tractability achieved by modeling POSG as
a sequence of Bayesian games Performs comparably to the full POSG for a
small finite-horizon problem Improved performance over ‘blind’
application of utility heuristic in more complex problems
Successful real-time game-theoretic controller for indoor robots
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo
Questions?
[email protected] www.cs.cmu.edu/~remery
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo
Back-Up Slides
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo
Policy Performance
-80
-70
-60
-50
-40
-30
-20
-10
0
10
20
3 4 5 6 7 8 9 10
Horizon
Ex
pe
cte
d o
r A
ve
rag
e R
ew
ard
Full POSG
Bayesian GameApproximationSelfish Policy
Lady And The Tiger [Nair et al. 2003]
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo
Robotic Team Tag I = {1,2} S = S1 X S2 X Sopponent
Si = {s0,…,s28}, sopponent= {s0,…,s28,stagged} |S| = 25230
Ai = {N,S,E,W,Tag} Zi = [{si,-1},s-i,a-i] T: adjacent cells O: see opponent if on same cell R: minimize capture time Modified from [Pineau et al. 2003]
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo
Environment
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo
Performance
-60
-50
-40
-30
-20
-10
0
Full Observability ofTeammate's Position
Without Full Observability ofTeammate's Position
Av
era
ge
Dis
co
un
ted
Va
lue
Full Observability
Most Likely State
QMDP
BayesianApproximation
Robotic Team Tag Results
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo
Performance
0
10
20
30
40
50
60
70
80
90
100
Full Observability ofTeammate's Position
Without FullObservability of
Teammate's Position
Av
era
ge
Tim
es
tep
s
Full Observability
Most Likely State
QMDP
BayesianApproximation
Robotic Team Tag Results