nash qlearning for generalsum stochastic games › ... › presentations › presentation11.pdf ·...

17

Nash Q-Learning for General-Sum Stochastic Games Hu, J. and Wellman, M.P. Lauri Lyly, [email protected]

Upload: others

Post on 26-Jun-2020

7 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

Page 1: Nash QLearning for GeneralSum Stochastic Games › ... › presentations › presentation11.pdf · Stochastic generalsum games Stochasticity: Environment is in part formed by other

Nash QLearning for GeneralSum Stochastic Games

Hu, J. and Wellman, M.P.

Lauri Lyly, [email protected]

Page 2: Nash QLearning for GeneralSum Stochastic Games › ... › presentations › presentation11.pdf · Stochastic generalsum games Stochasticity: Environment is in part formed by other

Stochastic generalsum games

● Stochasticity: Environment is in part formed by other agents– nondeterministic, noncooperative nature, no agreements

● Arbitrary relation between agents' rewards– Extends last time's topic zerosum

Page 3: Nash QLearning for GeneralSum Stochastic Games › ... › presentations › presentation11.pdf · Stochastic generalsum games Stochasticity: Environment is in part formed by other

Nash Equilibrium

● Bestresponse joint strategy● Study limited to stationary strategies (policies)● Rewards of others are perceived, strategies are not

Page 4: Nash QLearning for GeneralSum Stochastic Games › ... › presentations › presentation11.pdf · Stochastic generalsum games Stochasticity: Environment is in part formed by other

Nash QValues

Page 5: Nash QLearning for GeneralSum Stochastic Games › ... › presentations › presentation11.pdf · Stochastic generalsum games Stochasticity: Environment is in part formed by other

Stage game

● Oneperiod game as opposed to stochastic

● Mainly used in convergence proof● Nash equilibrium for the stage game. M is a “payoff

function”:

Page 6: Nash QLearning for GeneralSum Stochastic Games › ... › presentations › presentation11.pdf · Stochastic generalsum games Stochasticity: Environment is in part formed by other

Update rule

● Same update rule for agent itself and its conjecture on other agent's Qfunctions

● Qfunctions can be initialized for example to 0● Asynchronous updating: only entries pertaining to current

state are updated

Page 7: Nash QLearning for GeneralSum Stochastic Games › ... › presentations › presentation11.pdf · Stochastic generalsum games Stochasticity: Environment is in part formed by other

The Nash Qlearning algorithm

Page 8: Nash QLearning for GeneralSum Stochastic Games › ... › presentations › presentation11.pdf · Stochastic generalsum games Stochasticity: Environment is in part formed by other

Convergence proof requirements

● Assumption 1: Every stateaction tuple is visited infinitely often

● Assumption 2: Learning rate alpha(t) satisfies:– Sum from goes towards infinity– Squared sum does not– alpha = 0 if the element being updated doesn't correspond to

current stateaction tuple (asynchronous updating)

Page 9: Nash QLearning for GeneralSum Stochastic Games › ... › presentations › presentation11.pdf · Stochastic generalsum games Stochasticity: Environment is in part formed by other

Proof basis and result

● Qlearning process updated by pseudocontraction operator using the usual form:

Q=(1alpha(t))Q(t)+alpha(t)[P(t)Q(t)]Contraction: Values approach optimal Q

● Link between stage games and stochastic games● Goal: Show that NashQ is a pseudocontraction operator● Actually a real contraction operator in restricted conditions

– Game with special types of Nash equilibrium points

Page 10: Nash QLearning for GeneralSum Stochastic Games › ... › presentations › presentation11.pdf · Stochastic generalsum games Stochasticity: Environment is in part formed by other

Different Nash equilibria

● Global optimal point using stage game notation

● Saddle point

● All equilibria chosen for update must be same type

Page 11: Nash QLearning for GeneralSum Stochastic Games › ... › presentations › presentation11.pdf · Stochastic generalsum games Stochasticity: Environment is in part formed by other

Stage games compared

Global optimal point (Up, Left) Saddle point (Down,Right)

Page 12: Nash QLearning for GeneralSum Stochastic Games › ... › presentations › presentation11.pdf · Stochastic generalsum games Stochasticity: Environment is in part formed by other

Experimentation framework

● Two gridworld games

● Motivation for grid games: statespecific actions, qualitative transitions, immediate and longterm rewards

Page 13: Nash QLearning for GeneralSum Stochastic Games › ... › presentations › presentation11.pdf · Stochastic generalsum games Stochasticity: Environment is in part formed by other

Learning process

● Violates assumption 3 of monotonic selection of global optima or saddle points.

● Still converges in most cases regardless of selection● Offline and online learning rated separately

Page 14: Nash QLearning for GeneralSum Stochastic Games › ... › presentations › presentation11.pdf · Stochastic generalsum games Stochasticity: Environment is in part formed by other

Page 15: Nash QLearning for GeneralSum Stochastic Games › ... › presentations › presentation11.pdf · Stochastic generalsum games Stochasticity: Environment is in part formed by other

Page 16: Nash QLearning for GeneralSum Stochastic Games › ... › presentations › presentation11.pdf · Stochastic generalsum games Stochasticity: Environment is in part formed by other

Conclusions

● No current method provides performance guarantees for generalsum stochastic games

● Works as a starting point– Other promising variants– Nash equilibrium itself can be refined

Page 17: Nash QLearning for GeneralSum Stochastic Games › ... › presentations › presentation11.pdf · Stochastic generalsum games Stochasticity: Environment is in part formed by other

References

● [7] Hu, J. and Wellman, M.P. (2003). Nash QLearning for GeneralSum Stochastic Games. Journal of Machine Learning Research 4:1039–1069.

Synthesis for Multi-Objective Stochastic Games: An Application ... - PRISM … · 2013. 6. 3. · methods in PRISM-games, a model checker for stochastic multi-player games, and present

Stochastic Games with a Single Controller and Incomplete ...the analogue for stochastic games. Sorin (1984, 1985) and Sorin and Zamir (1991) studied classes of stochastic games with

Numerical Approximations for Stochastic Diﬀerential Games ... · Numerical Approximations for Stochastic Diﬀerential Games: The Ergodic Case Harold J. Kushner∗ Applied Mathematics

Stochastic Games Krishnendu Chatterjee CS 294 Game Theory

Two-stage allocations for stochastic linear programming games

RECURSIVE CONCURRENT STOCHASTIC GAMES - arXiv · We study Recursive Concurrent Stochastic Games (RCSGs), extendingour re-cent analysis of recursive simple stochastic games [16, 17]

Solving Stochastic Games

Stochastic Zero-sum and Nonzero-sum -regular Games

Forward–Backward Stochastic Differential Games and ... › mathfi › pdf-publications › article.pdf · 2 Maximum Principles for Stochastic Differential Games of Forward–Backward

A QLearning approach with ... - University of Sussex

Population Dynamics in Stochastic Games · 2013-03-13 · Introduction Stochastic Games Evolutionary Games Evolutionary Stochastic Games Concluding Remarks Population Dynamics in

Autonomous Demand Response using Stochastic Differential Games

Strategies in Stochastic Continuous-Time Games

RECURSIVE METHODS IN DISCOUNTED STOCHASTIC GAMES

De nable zero-sum stochastic games · 2017. 1. 29. · Keywords Zero-sum stochastic games, Shapley operator, o-minimal structures, deﬁnable games, uniform value, nonexpansive mappings,

STOCHASTIC GAMES WITH A SINGLE CONTROLLER AND …eilons/sgiiSIAM.pdf · STOCHASTIC GAMES WITH A SINGLE CONTROLLER AND INCOMPLETE INFORMATION∗ DINAH ROSENBERG†, EILON SOLAN‡,

Solving large stochastic games with reinforcement learning · 2015-09-18 · Solving large stochastic games with reinforcement learning Neal Hughes Australian National University,

Introduction to Game Theory - University Of Stochastic games.pdf · Nau: Game Theory 2 Stochastic Games A stochastic game is a collection of normal-form games that the agents play

Bertinoro, 2011 Stochastic games Dario Bauso › ~bagagiol › StochasticGames.pdf · Bertinoro, 2011 Stochastic games Dario Bauso Applications Stochastic games: formulation Two-players

Zero-Sum Stochastic Games An algorithmic review

Dynamic Oligopolistic Games Under Uncertainty: A Stochastic … › ~sreynold › GRSjedc31.pdf · Dynamic Oligopolistic Games Under Uncertainty: A Stochastic Programming Approach*

Solving Stackelberg Equilibrium in Stochastic Games

Continuous-Time Stochastic Games withqav.comlab.ox.ac.uk/papers/ic-contgames.pdf · Keywords: continuous time stochastic systems, time-bounded reachability, stochastic games 1. Introduction

Stochastic Direct Reinforcement: Application to Simple Games … · 2006-01-11 · Stochastic Direct Reinforcement: Application to Simple Games with Recurrence John Moody*, Yufeng

STOCHASTIC MARKSMANSHIP CONTEST GAMES WITH … › ~archive › pdf › e_mag › Vol.58_03_223.pdf · STOCHASTIC MARKSMANSHIP CONTEST GAMES WITH RANDOM TERMINATION | SURVEY AND APPLICATIONS

Stochastic Differential Games: A Sampling Approach via FBSDEsdcsl.gatech.edu/papers/dgaa17 (Printed).pdf · 2018-10-26 · Keywords Stochastic differential games · Forward and backward

STOCHASTIC DIFFERENTIAL PORTFOLIO GAMES · STOCHASTIC DIFFERENTIAL PORTFOLIO GAMES Sid Browne Columbia University Original Version: April 1998 Appeared in Journal of Applied Probability,

Algorithms for simple stochastic games

Joint Action Learners in Competitive Stochastic Games

Uri Zwick Tel Aviv University Simple Stochastic Games Mean Payoff Games Parity Games

Stochastic Perron for Stochastic Target Games...Stochastic Perron for Stochastic Target Games Jiaqi Li (Joint work with Erhan Bayraktar) Department of Mathematics University of Michigan

Dynamic Programming for Partially Observable Stochastic Games

A Lyapunov Optimization Approach to Repeated Stochastic Games

pitchfreunde Vol. 4 - Pitch-Deck: qlearning

Solving Stochastic Games - NeurIPS