intro duction to op enspielmlanctot.info/open_spiel-tutorial-kuleuven-mar11-2020.pdfintro to op...
TRANSCRIPT
![Page 1: Intro duction to Op enSpielmlanctot.info/open_spiel-tutorial-kuleuven-mar11-2020.pdfIntro to Op enSpiel (Released Aug ‘19) Private & Confidential Open source framework for research](https://reader036.vdocuments.site/reader036/viewer/2022071213/60294adaed777040ca7e01ae/html5/thumbnails/1.jpg)
Introduction to OpenSpiel
Marc Lanctot
Joint work with Edward Lockhart, Jean-Baptiste Lespiau, Vinicius Zambaldi, Satyaki Upadhyay, Julien Pérolat, Sriram Srinivasan, Finbarr Timbers, Karl Tuyls, Shayegan Omidshafiei, Daniel Hennes, Dustin Morrill, Paul Muller, Timo Ewalds, Ryan Faulkner, János Kramár, Bart De Vylder, Brennan Saeta, James Bradbury, David Ding, Sebastian Borgeaud, Matthew Lai, Julian Schrittwieser, Thomas Anthony, Edward Hughes, Ivo Danihelka, Jonah Ryan-Davis, and several external contributors!
![Page 2: Intro duction to Op enSpielmlanctot.info/open_spiel-tutorial-kuleuven-mar11-2020.pdfIntro to Op enSpiel (Released Aug ‘19) Private & Confidential Open source framework for research](https://reader036.vdocuments.site/reader036/viewer/2022071213/60294adaed777040ca7e01ae/html5/thumbnails/2.jpg)
Private & ConfidentialMany, many great collaborators!
![Page 3: Intro duction to Op enSpielmlanctot.info/open_spiel-tutorial-kuleuven-mar11-2020.pdfIntro to Op enSpiel (Released Aug ‘19) Private & Confidential Open source framework for research](https://reader036.vdocuments.site/reader036/viewer/2022071213/60294adaed777040ca7e01ae/html5/thumbnails/3.jpg)
Private & ConfidentialIntro to OpenSpiel (Released Aug ‘19)
● Open source framework for research on RL, search, and planning in games
● Main impl in C++ and Python. Also:○ Swift○ Julia (contributed post-release)
● ﹥25 games● ﹥10 algorithms
![Page 4: Intro duction to Op enSpielmlanctot.info/open_spiel-tutorial-kuleuven-mar11-2020.pdfIntro to Op enSpiel (Released Aug ‘19) Private & Confidential Open source framework for research](https://reader036.vdocuments.site/reader036/viewer/2022071213/60294adaed777040ca7e01ae/html5/thumbnails/4.jpg)
Private & ConfidentialOpenSpiel
Supports:
● n-player games● Zero-sum, coop, general-sum● Perfect / imperfect info● Simultaneous-move games
![Page 5: Intro duction to Op enSpielmlanctot.info/open_spiel-tutorial-kuleuven-mar11-2020.pdfIntro to Op enSpiel (Released Aug ‘19) Private & Confidential Open source framework for research](https://reader036.vdocuments.site/reader036/viewer/2022071213/60294adaed777040ca7e01ae/html5/thumbnails/5.jpg)
Private & ConfidentialTour of OpenSpiel
Main web site: github.com/deepmind/open_spiel/
(Link to open colab on the main site)
● Contributors● Games● Algorithms
![Page 6: Intro duction to Op enSpielmlanctot.info/open_spiel-tutorial-kuleuven-mar11-2020.pdfIntro to Op enSpiel (Released Aug ‘19) Private & Confidential Open source framework for research](https://reader036.vdocuments.site/reader036/viewer/2022071213/60294adaed777040ca7e01ae/html5/thumbnails/6.jpg)
Private & ConfidentialOpenSpiel: Example Viz (Kuhn Poker)
![Page 7: Intro duction to Op enSpielmlanctot.info/open_spiel-tutorial-kuleuven-mar11-2020.pdfIntro to Op enSpiel (Released Aug ‘19) Private & Confidential Open source framework for research](https://reader036.vdocuments.site/reader036/viewer/2022071213/60294adaed777040ca7e01ae/html5/thumbnails/7.jpg)
Private & ConfidentialOpenSpiel: Example Viz (Replicator dynamics)
![Page 8: Intro duction to Op enSpielmlanctot.info/open_spiel-tutorial-kuleuven-mar11-2020.pdfIntro to Op enSpiel (Released Aug ‘19) Private & Confidential Open source framework for research](https://reader036.vdocuments.site/reader036/viewer/2022071213/60294adaed777040ca7e01ae/html5/thumbnails/8.jpg)
Private & ConfidentialOpenSpiel: Example Viz (Replicator dynamics)
![Page 9: Intro duction to Op enSpielmlanctot.info/open_spiel-tutorial-kuleuven-mar11-2020.pdfIntro to Op enSpiel (Released Aug ‘19) Private & Confidential Open source framework for research](https://reader036.vdocuments.site/reader036/viewer/2022071213/60294adaed777040ca7e01ae/html5/thumbnails/9.jpg)
Private & ConfidentialMotivation: Why another games / RL library?
1. Promote work on general multiagent RLa. “Atari Learning Environment” of multiagent/gamesb. General game-learning
2. Games have specific requirements and use cases:a. Illegal moves, turn-based, etc.
3. Connecting research communities!4. Open code, metrics, communication, progress5. Reproducibility in research
![Page 10: Intro duction to Op enSpielmlanctot.info/open_spiel-tutorial-kuleuven-mar11-2020.pdfIntro to Op enSpiel (Released Aug ‘19) Private & Confidential Open source framework for research](https://reader036.vdocuments.site/reader036/viewer/2022071213/60294adaed777040ca7e01ae/html5/thumbnails/10.jpg)
Private & Confidential
Design Philosophy
1. Keep it simple.2. Keep it light.
Main structure:
● C++ core + Python API● Swift port● Julia API● Go API (in the works)● Games in C++● Algs in C++ and Python● Many examples / colab
Example
OpenSpiel: Design & Code
![Page 11: Intro duction to Op enSpielmlanctot.info/open_spiel-tutorial-kuleuven-mar11-2020.pdfIntro to Op enSpiel (Released Aug ‘19) Private & Confidential Open source framework for research](https://reader036.vdocuments.site/reader036/viewer/2022071213/60294adaed777040ca7e01ae/html5/thumbnails/11.jpg)
Private & ConfidentialObject-Oriented API
Game
State
SimMoveGame
NormalFormGame
MatrixGame
GameType
● dynamics● information● utility● reward_model
NewInitialState()
![Page 12: Intro duction to Op enSpielmlanctot.info/open_spiel-tutorial-kuleuven-mar11-2020.pdfIntro to Op enSpiel (Released Aug ‘19) Private & Confidential Open source framework for research](https://reader036.vdocuments.site/reader036/viewer/2022071213/60294adaed777040ca7e01ae/html5/thumbnails/12.jpg)
Private & ConfidentialOpenSpiel Live Demo, Part 1
● Showcase basic OpenSpiel core API by example via python interpreter.● Feel free to follow along in colab or locally!● Transcript of demo: demo1.txt
![Page 13: Intro duction to Op enSpielmlanctot.info/open_spiel-tutorial-kuleuven-mar11-2020.pdfIntro to Op enSpiel (Released Aug ‘19) Private & Confidential Open source framework for research](https://reader036.vdocuments.site/reader036/viewer/2022071213/60294adaed777040ca7e01ae/html5/thumbnails/13.jpg)
Multi-Agent and AI
Singh, Kearns & Mansour ‘03, Infinitesimal Gradient Ascent (IGA)
Multiagent Learning Dynamics
![Page 14: Intro duction to Op enSpielmlanctot.info/open_spiel-tutorial-kuleuven-mar11-2020.pdfIntro to Op enSpiel (Released Aug ‘19) Private & Confidential Open source framework for research](https://reader036.vdocuments.site/reader036/viewer/2022071213/60294adaed777040ca7e01ae/html5/thumbnails/14.jpg)
Multi-Agent and AI
Formalize optimization as a
dynamical system:
policy gradients
Analyze using well-established
techniques
Multiagent Learning Dynamics
Image from Singh, Kearns, & Mansour ‘03
![Page 15: Intro duction to Op enSpielmlanctot.info/open_spiel-tutorial-kuleuven-mar11-2020.pdfIntro to Op enSpiel (Released Aug ‘19) Private & Confidential Open source framework for research](https://reader036.vdocuments.site/reader036/viewer/2022071213/60294adaed777040ca7e01ae/html5/thumbnails/15.jpg)
Multi-Agent and AI
→ Evolutionary Game Theory: replicator dynamics
Replicator Dynamics
time derivative
![Page 16: Intro duction to Op enSpielmlanctot.info/open_spiel-tutorial-kuleuven-mar11-2020.pdfIntro to Op enSpiel (Released Aug ‘19) Private & Confidential Open source framework for research](https://reader036.vdocuments.site/reader036/viewer/2022071213/60294adaed777040ca7e01ae/html5/thumbnails/16.jpg)
Multi-Agent and AI
→ Evolutionary Game Theory: replicator dynamics
Replicator Dynamics
time derivative utility of action a against the joint policy / population of other players
![Page 17: Intro duction to Op enSpielmlanctot.info/open_spiel-tutorial-kuleuven-mar11-2020.pdfIntro to Op enSpiel (Released Aug ‘19) Private & Confidential Open source framework for research](https://reader036.vdocuments.site/reader036/viewer/2022071213/60294adaed777040ca7e01ae/html5/thumbnails/17.jpg)
Multi-Agent and AI
→ Evolutionary Game Theory: replicator dynamics
Replicator Dynamics
time derivative utility of action a against the joint policy / population of other players
Expected / average utility of the joint policy / population
![Page 18: Intro duction to Op enSpielmlanctot.info/open_spiel-tutorial-kuleuven-mar11-2020.pdfIntro to Op enSpiel (Released Aug ‘19) Private & Confidential Open source framework for research](https://reader036.vdocuments.site/reader036/viewer/2022071213/60294adaed777040ca7e01ae/html5/thumbnails/18.jpg)
Multi-Agent and AI
Bloembergen et al. 2015
Phase Portraits
![Page 19: Intro duction to Op enSpielmlanctot.info/open_spiel-tutorial-kuleuven-mar11-2020.pdfIntro to Op enSpiel (Released Aug ‘19) Private & Confidential Open source framework for research](https://reader036.vdocuments.site/reader036/viewer/2022071213/60294adaed777040ca7e01ae/html5/thumbnails/19.jpg)
Private & ConfidentialOpenSpiel Live Demo, Part 2
● Matrix games and learning dynamics● Feel free to follow along in colab or locally!● Transcript of demo: demo2.txt
![Page 20: Intro duction to Op enSpielmlanctot.info/open_spiel-tutorial-kuleuven-mar11-2020.pdfIntro to Op enSpiel (Released Aug ‘19) Private & Confidential Open source framework for research](https://reader036.vdocuments.site/reader036/viewer/2022071213/60294adaed777040ca7e01ae/html5/thumbnails/20.jpg)
Multi-Agent and AI
A simple MDP
A
B C D E
a b
Pr(B | A, a) = 0.75 Pr(C | A, a) = 0.25 Pr(D | A, b) = 0.4 Pr(E | A, b) = 0.6
3 -1 0 2 4 2 -3 2
c d e f g h i j
![Page 21: Intro duction to Op enSpielmlanctot.info/open_spiel-tutorial-kuleuven-mar11-2020.pdfIntro to Op enSpiel (Released Aug ‘19) Private & Confidential Open source framework for research](https://reader036.vdocuments.site/reader036/viewer/2022071213/60294adaed777040ca7e01ae/html5/thumbnails/21.jpg)
Multi-Agent and AI
A simple MDP Multiagent System
A
B C D E
a b
3 -1 0 2 4 2 -3 2
c d e f g h i j
Chance is a player with afixed stochastic policy!
![Page 22: Intro duction to Op enSpielmlanctot.info/open_spiel-tutorial-kuleuven-mar11-2020.pdfIntro to Op enSpiel (Released Aug ‘19) Private & Confidential Open source framework for research](https://reader036.vdocuments.site/reader036/viewer/2022071213/60294adaed777040ca7e01ae/html5/thumbnails/22.jpg)
Multi-Agent and AI
(A, a, F, 1, B, c) is a terminal history.
Terminal history A.K.A. Episode
A
B C D E
a b
3 -1 0 2 4 2 -3 2
c d e f g h i j
![Page 23: Intro duction to Op enSpielmlanctot.info/open_spiel-tutorial-kuleuven-mar11-2020.pdfIntro to Op enSpiel (Released Aug ‘19) Private & Confidential Open source framework for research](https://reader036.vdocuments.site/reader036/viewer/2022071213/60294adaed777040ca7e01ae/html5/thumbnails/23.jpg)
Multi-Agent and AI
(A, a, F, 1, B, c) is a terminal history. (A, b, G, 3, D, g) is a another terminal
history.
Terminal history A.K.A. Episode
A
B C D E
a b
3 -1 0 2 4 2 -3 2
c d e f g h i j
![Page 24: Intro duction to Op enSpielmlanctot.info/open_spiel-tutorial-kuleuven-mar11-2020.pdfIntro to Op enSpiel (Released Aug ‘19) Private & Confidential Open source framework for research](https://reader036.vdocuments.site/reader036/viewer/2022071213/60294adaed777040ca7e01ae/html5/thumbnails/24.jpg)
Multi-Agent and AI
(A, a, F, 2, C) is a history. It is a prefix of (A, a, F, 2, C, e) and (A, a, F, 2, C, f).
Prefix (non-terminal) Histories
A
B C D E
a b
3 -1 0 2 4 2 -3 2
c d e f g h i j
![Page 25: Intro duction to Op enSpielmlanctot.info/open_spiel-tutorial-kuleuven-mar11-2020.pdfIntro to Op enSpiel (Released Aug ‘19) Private & Confidential Open source framework for research](https://reader036.vdocuments.site/reader036/viewer/2022071213/60294adaed777040ca7e01ae/html5/thumbnails/25.jpg)
Private & ConfidentialPartially Observable Zero-Sum Games
● Players start w/ 2 chips● Each: ante 1 chip● 3-card deck● 2 actions: pass, bet● Reward: money diff
Kuhn (simplified) poker
![Page 26: Intro duction to Op enSpielmlanctot.info/open_spiel-tutorial-kuleuven-mar11-2020.pdfIntro to Op enSpiel (Released Aug ‘19) Private & Confidential Open source framework for research](https://reader036.vdocuments.site/reader036/viewer/2022071213/60294adaed777040ca7e01ae/html5/thumbnails/26.jpg)
Private & ConfidentialTerminology
● An information state, , corresponds to a sequence of observations○ with respect to the player to play at
Ante: 1 chip per player, , P1 bets (raise)
![Page 27: Intro duction to Op enSpielmlanctot.info/open_spiel-tutorial-kuleuven-mar11-2020.pdfIntro to Op enSpiel (Released Aug ‘19) Private & Confidential Open source framework for research](https://reader036.vdocuments.site/reader036/viewer/2022071213/60294adaed777040ca7e01ae/html5/thumbnails/27.jpg)
Private & ConfidentialTerminology
● An information state, , corresponds to a sequence of observations○ with respect to the player to play at
Ante: 1 chip per player, , P1 bets (raise)
private observation
![Page 28: Intro duction to Op enSpielmlanctot.info/open_spiel-tutorial-kuleuven-mar11-2020.pdfIntro to Op enSpiel (Released Aug ‘19) Private & Confidential Open source framework for research](https://reader036.vdocuments.site/reader036/viewer/2022071213/60294adaed777040ca7e01ae/html5/thumbnails/28.jpg)
Private & ConfidentialTerminology
● An information state, , corresponds to a sequence of observations○ with respect to the player to play at
Ante: 1 chip per player, , P1 bets (raise)
private observation
Environment is in one of many world states
![Page 29: Intro duction to Op enSpielmlanctot.info/open_spiel-tutorial-kuleuven-mar11-2020.pdfIntro to Op enSpiel (Released Aug ‘19) Private & Confidential Open source framework for research](https://reader036.vdocuments.site/reader036/viewer/2022071213/60294adaed777040ca7e01ae/html5/thumbnails/29.jpg)
Private & ConfidentialTerminology
● An information state, , corresponds to a sequence of observations○ with respect to the player to play at
Ante: 1 chip per player, , P1 bets (raise)
private observation
Environment is in one of many world states
full history of actions (including nature’s!!)
![Page 30: Intro duction to Op enSpielmlanctot.info/open_spiel-tutorial-kuleuven-mar11-2020.pdfIntro to Op enSpiel (Released Aug ‘19) Private & Confidential Open source framework for research](https://reader036.vdocuments.site/reader036/viewer/2022071213/60294adaed777040ca7e01ae/html5/thumbnails/30.jpg)
Private & ConfidentialOpenSpiel Live Demo, Part 3
● Imperfect information games, information state strings + vectors● Feel free to follow along in colab or locally!● Transcript of demo: demo3.txt
![Page 31: Intro duction to Op enSpielmlanctot.info/open_spiel-tutorial-kuleuven-mar11-2020.pdfIntro to Op enSpiel (Released Aug ‘19) Private & Confidential Open source framework for research](https://reader036.vdocuments.site/reader036/viewer/2022071213/60294adaed777040ca7e01ae/html5/thumbnails/31.jpg)
Private & ConfidentialQuery API
● OpenSpiel designed to be a generic API (breadth vs. depth)
![Page 32: Intro duction to Op enSpielmlanctot.info/open_spiel-tutorial-kuleuven-mar11-2020.pdfIntro to Op enSpiel (Released Aug ‘19) Private & Confidential Open source framework for research](https://reader036.vdocuments.site/reader036/viewer/2022071213/60294adaed777040ca7e01ae/html5/thumbnails/32.jpg)
Private & ConfidentialQuery API
● OpenSpiel designed to be a generic API (breadth vs. depth)● However, sometimes domain-specific knowledge is required.
![Page 33: Intro duction to Op enSpielmlanctot.info/open_spiel-tutorial-kuleuven-mar11-2020.pdfIntro to Op enSpiel (Released Aug ‘19) Private & Confidential Open source framework for research](https://reader036.vdocuments.site/reader036/viewer/2022071213/60294adaed777040ca7e01ae/html5/thumbnails/33.jpg)
Private & ConfidentialQuery API
● OpenSpiel designed to be a generic API (breadth vs. depth)● However, sometimes domain-specific knowledge is required.
OpenSpiel provides a query API to get knowledge about states:
- query.h, query.cc- python/pybind11/pyspiel.cc
![Page 34: Intro duction to Op enSpielmlanctot.info/open_spiel-tutorial-kuleuven-mar11-2020.pdfIntro to Op enSpiel (Released Aug ‘19) Private & Confidential Open source framework for research](https://reader036.vdocuments.site/reader036/viewer/2022071213/60294adaed777040ca7e01ae/html5/thumbnails/34.jpg)
Private & ConfidentialQuery API
● OpenSpiel designed to be a generic API (breadth vs. depth)● However, sometimes domain-specific knowledge is required.
OpenSpiel provides a query API to get knowledge about states:
- query.h, query.cc- python/pybind11/pyspiel.cc
Currently only one game uses this.
![Page 35: Intro duction to Op enSpielmlanctot.info/open_spiel-tutorial-kuleuven-mar11-2020.pdfIntro to Op enSpiel (Released Aug ‘19) Private & Confidential Open source framework for research](https://reader036.vdocuments.site/reader036/viewer/2022071213/60294adaed777040ca7e01ae/html5/thumbnails/35.jpg)
Private & ConfidentialBest File References
First example and API references:
● examples/example.cc,● python/examples/example.py
● python/examples/matrix_game_example.py
● python/egt/dynamics_test.py
● python/examples/kuhn_policy_gradient.py
● python/examples/tic_tac_toe_qlearner.py
● python/examples/independent_tabular_qlearning.py
Demo transcripts: demo1.txt, demo2.txt, demo3.txt
![Page 36: Intro duction to Op enSpielmlanctot.info/open_spiel-tutorial-kuleuven-mar11-2020.pdfIntro to Op enSpiel (Released Aug ‘19) Private & Confidential Open source framework for research](https://reader036.vdocuments.site/reader036/viewer/2022071213/60294adaed777040ca7e01ae/html5/thumbnails/36.jpg)
Private & ConfidentialThank You!
● Paper: https://arxiv.org/abs/1908.09453● Github: github.com/deepmind/open_spiel/