go in numbers€¦ · evaluating seoul alphago against computers crazy stone zen 3000 2500 2000...
TRANSCRIPT
![Page 1: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul](https://reader034.vdocuments.site/reader034/viewer/2022042302/5ecdcd0daa81ea6ad27e2b6f/html5/thumbnails/1.jpg)
![Page 2: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul](https://reader034.vdocuments.site/reader034/viewer/2022042302/5ecdcd0daa81ea6ad27e2b6f/html5/thumbnails/2.jpg)
Go in numbers
10^17040MPlayers Positions
3,000Years Old
![Page 3: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul](https://reader034.vdocuments.site/reader034/viewer/2022042302/5ecdcd0daa81ea6ad27e2b6f/html5/thumbnails/3.jpg)
The Rules of Go
Capture Territory
![Page 4: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul](https://reader034.vdocuments.site/reader034/viewer/2022042302/5ecdcd0daa81ea6ad27e2b6f/html5/thumbnails/4.jpg)
Brute force search intractable:
1. Search space is huge
2. “Impossible” for computers to evaluate who is winning
Game tree complexity = bd
Why is Go hard for computers to play?
![Page 5: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul](https://reader034.vdocuments.site/reader034/viewer/2022042302/5ecdcd0daa81ea6ad27e2b6f/html5/thumbnails/5.jpg)
![Page 6: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul](https://reader034.vdocuments.site/reader034/viewer/2022042302/5ecdcd0daa81ea6ad27e2b6f/html5/thumbnails/6.jpg)
![Page 7: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul](https://reader034.vdocuments.site/reader034/viewer/2022042302/5ecdcd0daa81ea6ad27e2b6f/html5/thumbnails/7.jpg)
Convolutional neural network
![Page 8: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul](https://reader034.vdocuments.site/reader034/viewer/2022042302/5ecdcd0daa81ea6ad27e2b6f/html5/thumbnails/8.jpg)
Value networkEvaluation
Position
s
v (s)
![Page 9: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul](https://reader034.vdocuments.site/reader034/viewer/2022042302/5ecdcd0daa81ea6ad27e2b6f/html5/thumbnails/9.jpg)
Policy network Move probabilities
Position
s
p (a|s)
![Page 10: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul](https://reader034.vdocuments.site/reader034/viewer/2022042302/5ecdcd0daa81ea6ad27e2b6f/html5/thumbnails/10.jpg)
Exhaustive search
![Page 11: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul](https://reader034.vdocuments.site/reader034/viewer/2022042302/5ecdcd0daa81ea6ad27e2b6f/html5/thumbnails/11.jpg)
Monte-Carlo rollouts
![Page 12: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul](https://reader034.vdocuments.site/reader034/viewer/2022042302/5ecdcd0daa81ea6ad27e2b6f/html5/thumbnails/12.jpg)
Reducing depth with value network
![Page 13: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul](https://reader034.vdocuments.site/reader034/viewer/2022042302/5ecdcd0daa81ea6ad27e2b6f/html5/thumbnails/13.jpg)
Reducing depth with value network
![Page 14: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul](https://reader034.vdocuments.site/reader034/viewer/2022042302/5ecdcd0daa81ea6ad27e2b6f/html5/thumbnails/14.jpg)
Reducing breadth with policy network
![Page 15: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul](https://reader034.vdocuments.site/reader034/viewer/2022042302/5ecdcd0daa81ea6ad27e2b6f/html5/thumbnails/15.jpg)
Deep reinforcement learning in AlphaGo
Human expertpositions
Supervised Learningpolicy network
Self-play data Value networkReinforcement Learningpolicy network
![Page 16: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul](https://reader034.vdocuments.site/reader034/viewer/2022042302/5ecdcd0daa81ea6ad27e2b6f/html5/thumbnails/16.jpg)
Policy network: 12 layer convolutional neural network
Training data: 30M positions from human expert games (KGS 5+ dan)
Training algorithm: maximise likelihood by stochastic gradient descent
Training time: 4 weeks on 50 GPUs using Google Cloud
Results: 57% accuracy on held out test data (state-of-the art was 44%)
Supervised learning of policy networks
![Page 17: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul](https://reader034.vdocuments.site/reader034/viewer/2022042302/5ecdcd0daa81ea6ad27e2b6f/html5/thumbnails/17.jpg)
Policy network: 12 layer convolutional neural network
Training data: games of self-play between policy network
Training algorithm: maximise wins z by policy gradient reinforcement learning
Training time: 1 week on 50 GPUs using Google Cloud
Results: 80% vs supervised learning. Raw network ~3 amateur dan.
Reinforcement learning of policy networks
![Page 18: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul](https://reader034.vdocuments.site/reader034/viewer/2022042302/5ecdcd0daa81ea6ad27e2b6f/html5/thumbnails/18.jpg)
Value network: 12 layer convolutional neural network
Training data: 30 million games of self-play
Training algorithm: minimise MSE by stochastic gradient descent
Training time: 1 week on 50 GPUs using Google Cloud
Results: First strong position evaluation function - previously thought impossible
Reinforcement learning of value networks
![Page 19: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul](https://reader034.vdocuments.site/reader034/viewer/2022042302/5ecdcd0daa81ea6ad27e2b6f/html5/thumbnails/19.jpg)
![Page 20: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul](https://reader034.vdocuments.site/reader034/viewer/2022042302/5ecdcd0daa81ea6ad27e2b6f/html5/thumbnails/20.jpg)
Monte-Carlo tree search in AlphaGo: selection
P prior probabilityQ action value
![Page 21: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul](https://reader034.vdocuments.site/reader034/viewer/2022042302/5ecdcd0daa81ea6ad27e2b6f/html5/thumbnails/21.jpg)
Monte-Carlo tree search in AlphaGo: expansion
Policy networkP prior probability p
![Page 22: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul](https://reader034.vdocuments.site/reader034/viewer/2022042302/5ecdcd0daa81ea6ad27e2b6f/html5/thumbnails/22.jpg)
Monte-Carlo tree search in AlphaGo: evaluation
Value networkv
![Page 23: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul](https://reader034.vdocuments.site/reader034/viewer/2022042302/5ecdcd0daa81ea6ad27e2b6f/html5/thumbnails/23.jpg)
Monte-Carlo tree search in AlphaGo: rollout
Value networkr Game scorerv
![Page 24: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul](https://reader034.vdocuments.site/reader034/viewer/2022042302/5ecdcd0daa81ea6ad27e2b6f/html5/thumbnails/24.jpg)
Monte-Carlo tree search in AlphaGo: backup
Value networkr Game scorerv Q Action value
![Page 25: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul](https://reader034.vdocuments.site/reader034/viewer/2022042302/5ecdcd0daa81ea6ad27e2b6f/html5/thumbnails/25.jpg)
Deep Blue
Handcrafted chess knowledge
Alpha-beta search guided by
heuristic evaluation function
200 million positions / second
AlphaGo
Knowledge learned from expert
games and self-play
Monte-Carlo search guided by
policy and value networks
60,000 positions / second
![Page 26: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul](https://reader034.vdocuments.site/reader034/viewer/2022042302/5ecdcd0daa81ea6ad27e2b6f/html5/thumbnails/26.jpg)
Nature AlphaGo Seoul AlphaGo
![Page 27: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul](https://reader034.vdocuments.site/reader034/viewer/2022042302/5ecdcd0daa81ea6ad27e2b6f/html5/thumbnails/27.jpg)
Evaluating Nature AlphaGo against computers
Zen
Crazy Stone
3000
2500
2000
1500
1000
500
0
3500
Go Programs
AlphaGo
9p7p5p3p1p9d
7d
5d
3d
1d1k
3k
5k
7k
Professional dan (p)
Amateur
dan (d)Beginner kyu (k)
Pachi
494/495 against computer opponents
>75% winning rate with 4 stone handicap
Even stronger using distributed machines
Fuego
Gnu
Go
Elo rating
![Page 28: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul](https://reader034.vdocuments.site/reader034/viewer/2022042302/5ecdcd0daa81ea6ad27e2b6f/html5/thumbnails/28.jpg)
Evaluating Nature AlphaGo against humans
Fan Hui (2p): European Champion 2013 - 2016
Match was played in October 2015
AlphaGo won the match 5-0
First program ever to beat a professional
on a full size 19x19 in an even game
![Page 29: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul](https://reader034.vdocuments.site/reader034/viewer/2022042302/5ecdcd0daa81ea6ad27e2b6f/html5/thumbnails/29.jpg)
Deep Reinforcement Learning (as Nature AlphaGo)
● Improved value network
● Improved policy network
● Improved search
● Improved hardware (TPU vs GPU)
Seoul AlphaGo
![Page 30: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul](https://reader034.vdocuments.site/reader034/viewer/2022042302/5ecdcd0daa81ea6ad27e2b6f/html5/thumbnails/30.jpg)
Evaluating Seoul AlphaGo against computers
Zen
Crazy Stone
3000
2500
2000
1500
1000
500
0
3500
AlphaGo (N
ature v13)
Pachi
Fuego
Gnu
Go
4000
4500 AlphaGo (Seoul v18)
9p7p5p3p1p9d
7d
5d
3d
1d1k
3k
5k
7k
Professional dan (p)
Amateur
dan (d)Beginner kyu (k)
>50% against Nature AlphaGo with 3 to 4 stone handicap
CAUTION: ratings based on self-play results
![Page 31: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul](https://reader034.vdocuments.site/reader034/viewer/2022042302/5ecdcd0daa81ea6ad27e2b6f/html5/thumbnails/31.jpg)
Evaluating Seoul AlphaGo against humans
Lee Sedol (9p): winner of 18 world titles
Match was played in Seoul, March 2016
AlphaGo won the match 4-1
![Page 32: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul](https://reader034.vdocuments.site/reader034/viewer/2022042302/5ecdcd0daa81ea6ad27e2b6f/html5/thumbnails/32.jpg)
![Page 33: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul](https://reader034.vdocuments.site/reader034/viewer/2022042302/5ecdcd0daa81ea6ad27e2b6f/html5/thumbnails/33.jpg)
AlphaGo vs Lee Sedol: Game 1
![Page 34: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul](https://reader034.vdocuments.site/reader034/viewer/2022042302/5ecdcd0daa81ea6ad27e2b6f/html5/thumbnails/34.jpg)
AlphaGo vs Lee Sedol: Game 2
![Page 35: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul](https://reader034.vdocuments.site/reader034/viewer/2022042302/5ecdcd0daa81ea6ad27e2b6f/html5/thumbnails/35.jpg)
AlphaGo vs Lee Sedol: Game 3
![Page 36: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul](https://reader034.vdocuments.site/reader034/viewer/2022042302/5ecdcd0daa81ea6ad27e2b6f/html5/thumbnails/36.jpg)
AlphaGo vs Lee Sedol: Game 4
![Page 37: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul](https://reader034.vdocuments.site/reader034/viewer/2022042302/5ecdcd0daa81ea6ad27e2b6f/html5/thumbnails/37.jpg)
AlphaGo vs Lee Sedol: Game 5
![Page 38: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul](https://reader034.vdocuments.site/reader034/viewer/2022042302/5ecdcd0daa81ea6ad27e2b6f/html5/thumbnails/38.jpg)
Deep Reinforcement Learning: Beyond AlphaGo
![Page 39: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul](https://reader034.vdocuments.site/reader034/viewer/2022042302/5ecdcd0daa81ea6ad27e2b6f/html5/thumbnails/39.jpg)
What’s Next?
![Page 40: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul](https://reader034.vdocuments.site/reader034/viewer/2022042302/5ecdcd0daa81ea6ad27e2b6f/html5/thumbnails/40.jpg)
Team
David Silver1*, Aja Huang1*, Chris J. Maddison1, Arthur Guez1, Laurent Sifre1, George van den Driessche1,Julian Schrittwieser1, Ioannis Antonoglou1, , Veda Panneershelvam1, Marc Lanctot1, Sander Dieleman1, Dominik Grewe1John Nham2, Nal Kalchbrenner1, Ilya Sutskever2, Timothy Lillicrap1, Madeleine Leach1, Koray Kavukcuoglu1,Thore Graepel1 & Demis Hassabis1
With thanks to: Lucas Baker, David Szepesvari, Malcolm Reynolds, Ziyu Wang, Nando De Freitas, Mike Johnson, Ilya Sutskever, Jeff Dean, Mike Marty, Sanjay Ghemawat.
DaveSilver
ChrisMaddison
ArthurGuez
LaurentSifre
GeorgeVan Den Driessche
JulianSchrittwieser
IoannisAntonoglou
VedaPanneershelvam
MarcLanctot
SanderDieleman
DominikGrewe
JohnNham
NalKalchbrenner
TimLillicrap
MaddyLeach
KorayKavukcuoglu
ThoreGraepel
AjaHuang
DemisHassabis
YutianChen
![Page 41: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul](https://reader034.vdocuments.site/reader034/viewer/2022042302/5ecdcd0daa81ea6ad27e2b6f/html5/thumbnails/41.jpg)
Demis Hassabis