![Page 1: Mastering the game of go with deep neural networks and tree search](https://reader035.vdocuments.site/reader035/viewer/2022062523/587b56b71a28abff1a8b6fe1/html5/thumbnails/1.jpg)
05/03/2023 1
Mastering the game of Go with deep neural networks and tree search
Speaker: San-Feng Chang
Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., van den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman,
S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T., and Hassabis, D.
Nature, 529(7587):484–489, 2016.
![Page 2: Mastering the game of go with deep neural networks and tree search](https://reader035.vdocuments.site/reader035/viewer/2022062523/587b56b71a28abff1a8b6fe1/html5/thumbnails/2.jpg)
05/03/2023 2
Outline
• AI in Game Playing• Previous Work of Go Research• Architecture of AlphaGo• AlphaGo’s methods• The playing strength of AlphaGo• Conclusion
![Page 3: Mastering the game of go with deep neural networks and tree search](https://reader035.vdocuments.site/reader035/viewer/2022062523/587b56b71a28abff1a8b6fe1/html5/thumbnails/3.jpg)
05/03/2023 3
AI in Game Playing(1/3)
• Game-playing is a specific problem to measure the performance of an AI.
• One classification for outcomes of an AI test is:Optimal It is not possible to perform better
Strong super-human Performs better than all humans
Super-human Performs better than most humans
Sub-human Performs worse than most humans
![Page 4: Mastering the game of go with deep neural networks and tree search](https://reader035.vdocuments.site/reader035/viewer/2022062523/587b56b71a28abff1a8b6fe1/html5/thumbnails/4.jpg)
05/03/2023 4
AI in Game Playing(2/3)
Game Players Branching Factor Depth Length Complexity
ChessDeep Blue vs
Kasparov (1997)
35 80 35^80 ≈ 10^123
Go AlphaGo vs Lee Sedol (2016) 250 150 250^150≈
10^360
Evolution of Gaming Tree Search:
Brute Force Minmax &Alpha-Beta MCTS AlphaGo’s
Method
![Page 5: Mastering the game of go with deep neural networks and tree search](https://reader035.vdocuments.site/reader035/viewer/2022062523/587b56b71a28abff1a8b6fe1/html5/thumbnails/5.jpg)
05/03/2023 5
AI in Game Playing(3/3)
• Minmax & Alpha-Beta Pruning
The complexity is still too high.
https://upload.wikimedia.org/wikipedia/commons/thumb/9/91/AB_pruning.svg/1280px-AB_pruning.svg.png?1458451165542
![Page 6: Mastering the game of go with deep neural networks and tree search](https://reader035.vdocuments.site/reader035/viewer/2022062523/587b56b71a28abff1a8b6fe1/html5/thumbnails/6.jpg)
05/03/2023 6
Previous Work of Go Research (1/4)
• Monte Carlo rollouts search to maximum depth without branching at all, by sampling long sequences of actions for both players from a policy p.
• Monte Carlo tree search (MCTS) uses Monte Carlo rollouts to estimate the value of each state in a search tree.
![Page 7: Mastering the game of go with deep neural networks and tree search](https://reader035.vdocuments.site/reader035/viewer/2022062523/587b56b71a28abff1a8b6fe1/html5/thumbnails/7.jpg)
05/03/2023 7
Previous Work of Go Research (2/4)
• Monte Carlo Tree Search:
2/3
1/1 1/2
1/1 0/1
2/3
1/1 1/2
1/1 0/1
Selection(Randomly)
Expansion
0/0
Player 1
Player 2
Player 1
![Page 8: Mastering the game of go with deep neural networks and tree search](https://reader035.vdocuments.site/reader035/viewer/2022062523/587b56b71a28abff1a8b6fe1/html5/thumbnails/8.jpg)
05/03/2023 8
• Monte Carlo Tree Search:
Previous Work of Go Research (3/4)
2/3
1/1 1/2
1/1 0/1
Simulation
0/0
......
3/4
1/1 2/3
2/2 0/1
Back-Propagation
1/1
Player 1
Player 2
Player 1
Player 2
![Page 9: Mastering the game of go with deep neural networks and tree search](https://reader035.vdocuments.site/reader035/viewer/2022062523/587b56b71a28abff1a8b6fe1/html5/thumbnails/9.jpg)
05/03/2023 9
Previous Work of Go Research (4/4)
• The strongest current Go programs are based on MCTS, enhanced by policies that are trained to predict human expert moves.
• However, prior work has been limited to shallow policies or value functions based on a linear combination of input features.
![Page 10: Mastering the game of go with deep neural networks and tree search](https://reader035.vdocuments.site/reader035/viewer/2022062523/587b56b71a28abff1a8b6fe1/html5/thumbnails/10.jpg)
05/03/2023 10
Architecture of AlphaGo
Neural Network Training Pipeline
s: board positiona: legal moves
p(a|s): probability distributionv(s): scalar value
Two Brains
Human expert dataset: KGS server ~ 160,000 games
29.4 million positions
![Page 11: Mastering the game of go with deep neural networks and tree search](https://reader035.vdocuments.site/reader035/viewer/2022062523/587b56b71a28abff1a8b6fe1/html5/thumbnails/11.jpg)
05/03/2023 11
Convolution Neural Network(1/2)
A regular 3-layer Neural Network A convolutional neural network
Input volume of size: W1 x H1 x D1
Requires four hyperparameters: 1. Number of filters K (depth) 2. Spatial extent F (kernel size) 3. The stride S 4. The amount of zero padding P
Output volume size: W2 x H2 x D2
W2 = (W1 – F + 2P)/S + 1 H2 = (H1 – F + 2P)/S + 1 D2 = k• Parameter sharing: total weights: (F * F * D1) * K
http://cs231n.github.io/convolutional-networks/
![Page 12: Mastering the game of go with deep neural networks and tree search](https://reader035.vdocuments.site/reader035/viewer/2022062523/587b56b71a28abff1a8b6fe1/html5/thumbnails/12.jpg)
05/03/2023 12
Convolution Neural Network(2/2)
http://cs231n.github.io/convolutional-networks/
Number of filter K: 2Spatial extent F: 3 x 3Stride S: 2Zero padding P: 1
![Page 13: Mastering the game of go with deep neural networks and tree search](https://reader035.vdocuments.site/reader035/viewer/2022062523/587b56b71a28abff1a8b6fe1/html5/thumbnails/13.jpg)
05/03/2023 13
AlphaGo’s methods – Trained by Human Expert (1/6)
• Rollout Policy : – Using 2μs to select an action but only 24.2% accuracy
to predict expert moves correctly – Using a linear softmax of small pattern features with
weights π
p
n1
n2
n3
n1,in
n2,in
n3,in
ininin
in
nnn
n
out eeeen
,3,2,1
,1
,1
https://qph.fs.quoracdn.net/main-qimg-9e2d012ef7cb8b29d2bed14d2975c986
![Page 14: Mastering the game of go with deep neural networks and tree search](https://reader035.vdocuments.site/reader035/viewer/2022062523/587b56b71a28abff1a8b6fe1/html5/thumbnails/14.jpg)
05/03/2023 14
AlphaGo’s methods – Trained by Human Expert (2/6)
• SL policy :– Using 3ms to select an action and 57.0% accuracy
to predict expert moves correctly – Using 13 layers convolutional neural network with
weights σ
p
......
InputSize: 19*1948 planes
First layerConv + ReLU
Kernel size: 5 x 5
2nd~12th layers Conv + ReLU
Kernel size: 3 x 3
13th layers Kernel size: 1 x 1, 1 filter, softmax
![Page 15: Mastering the game of go with deep neural networks and tree search](https://reader035.vdocuments.site/reader035/viewer/2022062523/587b56b71a28abff1a8b6fe1/html5/thumbnails/15.jpg)
05/03/2023 15
AlphaGo’s methods – Reinforcement Learning pρ (3/6)
SL policypσ
Initialize Weightsρ = ρ- = σ
RL policypρ
pρ- pρ
Opponent pool
Play ...... End
r
rewardPolicy Gradient
Method
Add pρ to opponent pool
![Page 16: Mastering the game of go with deep neural networks and tree search](https://reader035.vdocuments.site/reader035/viewer/2022062523/587b56b71a28abff1a8b6fe1/html5/thumbnails/16.jpg)
05/03/2023 16
AlphaGo’s methods – Value Network vθ (4/6)
• Supervised Learning: – Used to estimate the positions’ winning rate at
current state– Using 15 layers CNN
......
InputSize: 19*1948 planes+1 unit(current color)
1st~13th layers The same as
RL Policy networks
15th layers Full-connected
1 tanh unit
14th layerFully-connected256 ReLU unit
![Page 17: Mastering the game of go with deep neural networks and tree search](https://reader035.vdocuments.site/reader035/viewer/2022062523/587b56b71a28abff1a8b6fe1/html5/thumbnails/17.jpg)
05/03/2023 17
AlphaGo’s methods – Value Network vθ (5/6)
• Randomly sample an integer U in 1 ~ 450– t = 1 ~ U-1 – Played by SL policy network pσ
– t = U – Random action– t = U+1 ~ End – Played by RL policy network pρ
• Reward • Only a single training example (sU+1, zU+1) is
added to the data set from each game.
Tt srz
![Page 18: Mastering the game of go with deep neural networks and tree search](https://reader035.vdocuments.site/reader035/viewer/2022062523/587b56b71a28abff1a8b6fe1/html5/thumbnails/18.jpg)
05/03/2023 18
AlphaGo’s methods – Searching (6/6)
• Q: Action Value Winning scores• u(P): Upper Confidence bound Exploration vs. Exploitation • P: Prior probability using pσ (SL performed better than RL)
More
![Page 19: Mastering the game of go with deep neural networks and tree search](https://reader035.vdocuments.site/reader035/viewer/2022062523/587b56b71a28abff1a8b6fe1/html5/thumbnails/19.jpg)
05/03/2023 19
The playing strength of AlphaGo
![Page 20: Mastering the game of go with deep neural networks and tree search](https://reader035.vdocuments.site/reader035/viewer/2022062523/587b56b71a28abff1a8b6fe1/html5/thumbnails/20.jpg)
05/03/2023 20
Conclusion
• Reaching a milestone is the beginning of the next milestone.
• Stay hungry, stay foolish!
![Page 21: Mastering the game of go with deep neural networks and tree search](https://reader035.vdocuments.site/reader035/viewer/2022062523/587b56b71a28abff1a8b6fe1/html5/thumbnails/21.jpg)
05/03/2023 21
References(1/2)
• Nature: – Mastering the game of Go with deep neural
networks and tree search• Mark Chang:– http://
www.slideshare.net/ckmarkohchang/alphago-in-depth
• CNN:– http://cs231n.github.io/convolutional-networks/
![Page 22: Mastering the game of go with deep neural networks and tree search](https://reader035.vdocuments.site/reader035/viewer/2022062523/587b56b71a28abff1a8b6fe1/html5/thumbnails/22.jpg)
05/03/2023 22
References(2/2)
• 陳鍾誠– http://www.slideshare.net/ccckmit/30alphago
• Monte Carlo Tree Search– https://jeffbradberry.com/posts/2015/09/intro-t
o-monte-carlo-tree-search/
• How AlphaGo Works– http://
www.slideshare.net/ShaneSeungwhanMoon/how-alphago-works
![Page 23: Mastering the game of go with deep neural networks and tree search](https://reader035.vdocuments.site/reader035/viewer/2022062523/587b56b71a28abff1a8b6fe1/html5/thumbnails/23.jpg)
05/03/2023 23
EndThank You
![Page 24: Mastering the game of go with deep neural networks and tree search](https://reader035.vdocuments.site/reader035/viewer/2022062523/587b56b71a28abff1a8b6fe1/html5/thumbnails/24.jpg)
05/03/2023 24
Formula(1/2)
• Policy Network: classification
• Policy Network: reinforcement learning
• Value Network: regression
m
k
kk sap
m 1
log
itit
n
i
i
t
it
it svzsap
n
1 1
log
km
kkk svsvz
m 1
![Page 25: Mastering the game of go with deep neural networks and tree search](https://reader035.vdocuments.site/reader035/viewer/2022062523/587b56b71a28abff1a8b6fe1/html5/thumbnails/25.jpg)
05/03/2023 25
Formula(2/2)• Searching:
asuasQa tta
t ,,maxarg
asNasPasu,1,,
n
i
iaslasN1
,,,
n
iLisViasl
asNasQ
1
,,,1,
l(s,a,i) indicates whether an edge (s,a) ith simulation
siL is the leaf node from ith simulation
LLL zsvsV 1
Back
asNbsN
asPcasu b rpuct ,1
,,,
![Page 26: Mastering the game of go with deep neural networks and tree search](https://reader035.vdocuments.site/reader035/viewer/2022062523/587b56b71a28abff1a8b6fe1/html5/thumbnails/26.jpg)
05/03/2023 26
How AlphaGo selected its move
![Page 27: Mastering the game of go with deep neural networks and tree search](https://reader035.vdocuments.site/reader035/viewer/2022062523/587b56b71a28abff1a8b6fe1/html5/thumbnails/27.jpg)
05/03/2023 27
The playing strength of AlphaGo(Bonus 1)
![Page 28: Mastering the game of go with deep neural networks and tree search](https://reader035.vdocuments.site/reader035/viewer/2022062523/587b56b71a28abff1a8b6fe1/html5/thumbnails/28.jpg)
05/03/2023 28
The playing strength of AlphaGo(Bonus 2)