igb go —— a self-learning go program
DESCRIPTION
IGB GO —— A self-learning GO program. Lin WU Information & Computer Science University of California, Irvine. Outline. Background: What is GO? Existing GO programs IGB GO Past work: Three past scenarios Present scenario Discussion Conclusion Demon. What is GO. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: IGB GO —— A self-learning GO program](https://reader036.vdocuments.site/reader036/viewer/2022062410/5681598b550346895dc6d162/html5/thumbnails/1.jpg)
IGB GO—— A self-learning GO program
Lin WUInformation & Computer Science
University of California, Irvine
![Page 2: IGB GO —— A self-learning GO program](https://reader036.vdocuments.site/reader036/viewer/2022062410/5681598b550346895dc6d162/html5/thumbnails/2.jpg)
12/02/2003Lin WU, [email protected]
Outline
Background:– What is GO?– Existing GO programs
IGB GO Past work:
– Three past scenarios– Present scenario
Discussion Conclusion Demon.
![Page 3: IGB GO —— A self-learning GO program](https://reader036.vdocuments.site/reader036/viewer/2022062410/5681598b550346895dc6d162/html5/thumbnails/3.jpg)
12/02/2003Lin WU, [email protected]
What is GO
Black and white player play alternatively.
Black plays first. Basic concepts:
– Liberty– Eye– Territory– Unconditional live– Position
![Page 4: IGB GO —— A self-learning GO program](https://reader036.vdocuments.site/reader036/viewer/2022062410/5681598b550346895dc6d162/html5/thumbnails/4.jpg)
12/02/2003Lin WU, [email protected]
What is GO (cont.)
Rules– Stone(s) are captured, if the liberty becomes 0.– Captured stones are removed from board– Winner is determined by counting the territory
![Page 5: IGB GO —— A self-learning GO program](https://reader036.vdocuments.site/reader036/viewer/2022062410/5681598b550346895dc6d162/html5/thumbnails/5.jpg)
12/02/2003Lin WU, [email protected]
Existing GO programs
There are many existing GO programs– KCC Igo– HARUKA– Go++– Goemate– Hand talk– The Many Faces of Go: www.smart-games.com– GNU GO: www.gnu.org/software/gnugo/gnugo.html– NeuroGo: www.markus-enzenberger.de/neurogo.html– etc.
None of them can beat average amateur players.
![Page 6: IGB GO —— A self-learning GO program](https://reader036.vdocuments.site/reader036/viewer/2022062410/5681598b550346895dc6d162/html5/thumbnails/6.jpg)
12/02/2003Lin WU, [email protected]
Conceptual Architecture
Pattern libraries:– Library for opening– Library of corner– Library for the internal part of the board– Libraries for attack, defense, connection, etc.
Engine: match the board position against the libraries Evaluation: determine the best, if there are multiple hits
![Page 7: IGB GO —— A self-learning GO program](https://reader036.vdocuments.site/reader036/viewer/2022062410/5681598b550346895dc6d162/html5/thumbnails/7.jpg)
12/02/2003Lin WU, [email protected]
Architecture I
The many faces of GO (1981- now)– Knowledge Representation in The Many Faces of
Go, David Fotland, February 27, 1993– Joseki database of standard corner patterns (36,000
moves)– a pattern database of 8x8 patterns (4,000 moves)– a rule based expert system with about 200 rules that
suggests plausible moves for full board evaluation
![Page 8: IGB GO —— A self-learning GO program](https://reader036.vdocuments.site/reader036/viewer/2022062410/5681598b550346895dc6d162/html5/thumbnails/8.jpg)
12/02/2003Lin WU, [email protected]
Architecture II
GNU GO (1989.3 – now)– GNU GO documentation– Pattern libraries
General: patterns.db, patterns2.db Fuseki (opening): fuseki.db Eyes: eyes.db Connection: conn.db Influence: influence.db, barriers.db Etc
– GNU Go engine: calculate states of different level, pattern matching, move reasoning.
![Page 9: IGB GO —— A self-learning GO program](https://reader036.vdocuments.site/reader036/viewer/2022062410/5681598b550346895dc6d162/html5/thumbnails/9.jpg)
12/02/2003Lin WU, [email protected]
Why pattern based system?
Simple rules doesn’t mean simple game– Simple rules means extremely huge searching space
Board evaluation is hard, especially in the middle of the game
– The representation space is extremely huge– The evaluation function is sensitive to small difference of input– Result: to get reliable evaluation results, the level of search
have to be very high Pattern based system
– Avoid search by pattern matching
![Page 10: IGB GO —— A self-learning GO program](https://reader036.vdocuments.site/reader036/viewer/2022062410/5681598b550346895dc6d162/html5/thumbnails/10.jpg)
12/02/2003Lin WU, [email protected]
Complexity —— Search time
Search Level
5x5 7x7 9x9 19x19
1 25 49 81 361
2 625 2,401 6,561 130,321
3 15,625 117,649 531,441 4.7 E 7
4 390,625 5.7 E 6 4.3 E 7 2.1 E 8
![Page 11: IGB GO —— A self-learning GO program](https://reader036.vdocuments.site/reader036/viewer/2022062410/5681598b550346895dc6d162/html5/thumbnails/11.jpg)
12/02/2003Lin WU, [email protected]
Problems of pattern based system
Everything is manual work As system become larger, it’s harder to improve the
pattern database. As database becomes larger, more likely to be
inconsistent.
Results:– Performance improves slower as the performance becomes
better.
![Page 12: IGB GO —— A self-learning GO program](https://reader036.vdocuments.site/reader036/viewer/2022062410/5681598b550346895dc6d162/html5/thumbnails/12.jpg)
12/02/2003Lin WU, [email protected]
Outline
Background:– What is GO?– Existing GO programs
IGB GO Past work:
– Three past scenarios– Present scenario
Discussion Conclusion Demon.
![Page 13: IGB GO —— A self-learning GO program](https://reader036.vdocuments.site/reader036/viewer/2022062410/5681598b550346895dc6d162/html5/thumbnails/13.jpg)
12/02/2003Lin WU, [email protected]
IGB GO
http://contact.ics.uci.edu/go.html A GO program which can improve its
performance automatically How?
– Use artificial neural networks to learn the evaluation function.
– Improving the quality of the neural networks by improving the quality of training data.
![Page 14: IGB GO —— A self-learning GO program](https://reader036.vdocuments.site/reader036/viewer/2022062410/5681598b550346895dc6d162/html5/thumbnails/14.jpg)
12/02/2003Lin WU, [email protected]
Architecture of the neural networks
6 planes– 1 input plane– 1 output plane– 4 transmission
Use recurrent neural network to learn two functions
)()()()(
21
,,,,,,
,,,,
SEMISWMINEMINWMIebwO
MI
INo
MIMIebwMIINMI
OOOOINININfO
OOINININfO
![Page 15: IGB GO —— A self-learning GO program](https://reader036.vdocuments.site/reader036/viewer/2022062410/5681598b550346895dc6d162/html5/thumbnails/15.jpg)
12/02/2003Lin WU, [email protected]
How to improving the training data
1. Initiate a group of neural networks
2. Let neural networks play against each other
3. Identify the set of good moves
4. Train neural networks over those good moves
5. Repeat 2.
![Page 16: IGB GO —— A self-learning GO program](https://reader036.vdocuments.site/reader036/viewer/2022062410/5681598b550346895dc6d162/html5/thumbnails/16.jpg)
12/02/2003Lin WU, [email protected]
Two key issues of this system
Given the neural networks, how to identify “the good moves”
Given the good moves, how to improve neural networks’ performance efficiently
![Page 17: IGB GO —— A self-learning GO program](https://reader036.vdocuments.site/reader036/viewer/2022062410/5681598b550346895dc6d162/html5/thumbnails/17.jpg)
12/02/2003Lin WU, [email protected]
Outline
Background:– What is GO?– Existing GO programs
IGB GO Past work:
– Three past scenarios– Present scenario
Discussion Conclusion Demon.
![Page 18: IGB GO —— A self-learning GO program](https://reader036.vdocuments.site/reader036/viewer/2022062410/5681598b550346895dc6d162/html5/thumbnails/18.jpg)
12/02/2003Lin WU, [email protected]
Play against itself
1. Randomly initiate a neural network
2. The neural network plays against itself over a set of initial setups.
3. If black(or white) wins, learn the black(or white) moves.
4. Update weights, repeat 2.
![Page 19: IGB GO —— A self-learning GO program](https://reader036.vdocuments.site/reader036/viewer/2022062410/5681598b550346895dc6d162/html5/thumbnails/19.jpg)
12/02/2003Lin WU, [email protected]
Play against itself — Good move identification
Win: the color who gets larger territory Good moves: all the moves played by wining
color
![Page 20: IGB GO —— A self-learning GO program](https://reader036.vdocuments.site/reader036/viewer/2022062410/5681598b550346895dc6d162/html5/thumbnails/20.jpg)
12/02/2003Lin WU, [email protected]
Play against itself — Results
Results– First, improve– Then, begin to get worse– Last, learn a very deterministic and bad pattern
Improvement: No guarantee.
![Page 21: IGB GO —— A self-learning GO program](https://reader036.vdocuments.site/reader036/viewer/2022062410/5681598b550346895dc6d162/html5/thumbnails/21.jpg)
12/02/2003Lin WU, [email protected]
Group playing
1. Initiate a group of neural networks (18)2. Randomly assign a neural network to another
as a pair.3. Members in a pair play against each other4. Identify the set of good moves5. Train the loser neural networks over those
good moves6. Repeat 2.
![Page 22: IGB GO —— A self-learning GO program](https://reader036.vdocuments.site/reader036/viewer/2022062410/5681598b550346895dc6d162/html5/thumbnails/22.jpg)
12/02/2003Lin WU, [email protected]
Group playing — Good move identification
Each pair has two players (A and B) Game1: A plays black, B plays white, get a
result R1 Game2: B plays black, A plays white, get a
result R2 If R1 > R2, then A is better player. B is the
loser. So B learn all the moves played by A.
![Page 23: IGB GO —— A self-learning GO program](https://reader036.vdocuments.site/reader036/viewer/2022062410/5681598b550346895dc6d162/html5/thumbnails/23.jpg)
12/02/2003Lin WU, [email protected]
Group playing — Results
Results– Improve at beginning.– If a player dominates, the whole system degrades
as “play against itself”.– No indication of converge till now. (9 machines, 1
month on 9 by 9 board)
Improvement: No guarantee.
![Page 24: IGB GO —— A self-learning GO program](https://reader036.vdocuments.site/reader036/viewer/2022062410/5681598b550346895dc6d162/html5/thumbnails/24.jpg)
12/02/2003Lin WU, [email protected]
ABC scenario
1. Initiate a group of neural networks
2. Randomly assign three different neural networks (A,B,C) in a group
3. Let A and B play against each other
4. Identify the set of good moves
5. Train neural networks over those good moves
6. Repeat 2.
![Page 25: IGB GO —— A self-learning GO program](https://reader036.vdocuments.site/reader036/viewer/2022062410/5681598b550346895dc6d162/html5/thumbnails/25.jpg)
12/02/2003Lin WU, [email protected]
ABC scenario — Good move identification
For a given pair with player A and player B– Suppose B is the loser.
Randomly assign a teacher C– C will tell B, what move C will make for every B’s turn
C’s suggested move is the same as that of B C’s suggested move is different from B
Based on C’s suggest move, A play with B again– Better: understandable good move– The same– Worse
The set of good moves is all the understandable good moves
![Page 26: IGB GO —— A self-learning GO program](https://reader036.vdocuments.site/reader036/viewer/2022062410/5681598b550346895dc6d162/html5/thumbnails/26.jpg)
12/02/2003Lin WU, [email protected]
ABC scenario — Results
Results– It took 1 week to get a best player from 3 randomly
initialized players– The best player was beaten by another randomly
initialized player.– The speed of improving became slower as the
performance increased. Improvement: guarantee. Training Speed: unacceptable slow
![Page 27: IGB GO —— A self-learning GO program](https://reader036.vdocuments.site/reader036/viewer/2022062410/5681598b550346895dc6d162/html5/thumbnails/27.jpg)
12/02/2003Lin WU, [email protected]
Present scenario
Output representation:– Two papers:
Temporal Difference Learning of Position Evaluation in the Game of Go, Nicol N. Schraudolph, Peter Dayan, and Terrence J. Sejnowski, Advances in Neural Information Processing 6, 1994
Learning to evaluate GO positions via temporal difference methods, Nicol N. Schraudolph, Peter Dayan, and Terrence J. Sejnowski, Soft Computing Techniques in Game Playing, 2000
– Each intersection has an output: real number [0,1]– The likelihood to make a move => the likelihood of securing that
intersection as black territory at the end of the game.– Reinforcement learning
Good move identification: reinforcement learning identify good moves automatically
![Page 28: IGB GO —— A self-learning GO program](https://reader036.vdocuments.site/reader036/viewer/2022062410/5681598b550346895dc6d162/html5/thumbnails/28.jpg)
12/02/2003Lin WU, [email protected]
Present scenario — Results
Improvement: guarantee. Training Speed: better than ABC scenario, but
still slow Results
– 5x5: 3 - 4 hours training: beat random player 100% 1 - 2 weeks (168-336 h): comparable to GNUGO Prediction accuracy is >90% after the board is occupied >50%
– 7x7: after 1 month of training, GNUGO beats it without any difficulty
![Page 29: IGB GO —— A self-learning GO program](https://reader036.vdocuments.site/reader036/viewer/2022062410/5681598b550346895dc6d162/html5/thumbnails/29.jpg)
12/02/2003Lin WU, [email protected]
Outline
Background:– What is GO?– Existing GO programs
IGB GO Past work:
– Three past scenarios– Present scenario
Discussion Conclusion Demon.
![Page 30: IGB GO —— A self-learning GO program](https://reader036.vdocuments.site/reader036/viewer/2022062410/5681598b550346895dc6d162/html5/thumbnails/30.jpg)
12/02/2003Lin WU, [email protected]
Why better results
Old architecture– Target is inconsistent– Target is harder to learn,
spatial complexity 325 / 8 ( 105911076180.375) for 5x5
– Quality of training data is bad
New architecture– Target is consistent, and
at the end of the game, it’s true target.
– Target correlates mainly to local information, so the complexity should be much less than 325 / 8
– Quality of training data is determined by the neural network itself.
![Page 31: IGB GO —— A self-learning GO program](https://reader036.vdocuments.site/reader036/viewer/2022062410/5681598b550346895dc6d162/html5/thumbnails/31.jpg)
12/02/2003Lin WU, [email protected]
Is present arch. enough — search time complexity
Board Size
2(25%) 3(25%) 2(50%) 3(50%)
5x5 64 729 4,096 531,441
7x7 4,096 531,441 1.6E7 2.8E11
9x9 1E6 3.5E9 1E12 1.2E19
19x19 1.2E27 8.7E42 1.5E54 7.6E85
![Page 32: IGB GO —— A self-learning GO program](https://reader036.vdocuments.site/reader036/viewer/2022062410/5681598b550346895dc6d162/html5/thumbnails/32.jpg)
12/02/2003Lin WU, [email protected]
Known Problems
Intrinsic hard problems:– No complexity bounds for the number of iterations to
get a better player
– Representation space is extremely huge
![Page 33: IGB GO —— A self-learning GO program](https://reader036.vdocuments.site/reader036/viewer/2022062410/5681598b550346895dc6d162/html5/thumbnails/33.jpg)
12/02/2003Lin WU, [email protected]
Known Problems — Technical
Temporary technical problems:– Lack position-level evaluation method– Unable to respond to some unusual cases correctly
Unable to AUTOMATICALLY identify the unusual cases, which will cause problems
– Time complexity per iteration: Play a match: O(n6W) Learn a match: O(n4W) for TD0, O(n6W) for Q-Learning (19/5)6 = 3011
![Page 34: IGB GO —— A self-learning GO program](https://reader036.vdocuments.site/reader036/viewer/2022062410/5681598b550346895dc6d162/html5/thumbnails/34.jpg)
12/02/2003Lin WU, [email protected]
Bounds for iteration
Maybe exponential Observation:
– Human being: the complexity increases as the level of player increases.
– Present implementation: same as above
Important to know– How fast the complexity increases, as the level of
player increases?
![Page 35: IGB GO —— A self-learning GO program](https://reader036.vdocuments.site/reader036/viewer/2022062410/5681598b550346895dc6d162/html5/thumbnails/35.jpg)
12/02/2003Lin WU, [email protected]
The complexity could be exponential
– Suppose, one player dominate the whole system, or a small group of players dominate the whole system
– How much time is needed for obtaining a better new player or a better group?
– Repeat the experiment, with the same amount of time, there is a 50% chance to get a better one, due to the symmetry
– At least exponential to 2.
![Page 36: IGB GO —— A self-learning GO program](https://reader036.vdocuments.site/reader036/viewer/2022062410/5681598b550346895dc6d162/html5/thumbnails/36.jpg)
12/02/2003Lin WU, [email protected]
Position-level performance evaluation
With it– Study the iteration bounds empirically– The evaluation results can be used to find good
tradeoff between performance and searching space
Without it– Every method is trial and error, but there exists
infinite number of potential methods to try.
![Page 37: IGB GO —— A self-learning GO program](https://reader036.vdocuments.site/reader036/viewer/2022062410/5681598b550346895dc6d162/html5/thumbnails/37.jpg)
12/02/2003Lin WU, [email protected]
Time complexity per iteration
Separate “play” and “learn”– A database of training data– Training data:
Best players play against each other Online server Manually find ways to beat the best player.
– All players learn the generated training data
![Page 38: IGB GO —— A self-learning GO program](https://reader036.vdocuments.site/reader036/viewer/2022062410/5681598b550346895dc6d162/html5/thumbnails/38.jpg)
12/02/2003Lin WU, [email protected]
Unusual move identification
Difficulty– Search space is huge Hard to identify
automatically
Possible solution– Use database to record all such moves, once they
appear Can be implemented the same as training database
![Page 39: IGB GO —— A self-learning GO program](https://reader036.vdocuments.site/reader036/viewer/2022062410/5681598b550346895dc6d162/html5/thumbnails/39.jpg)
12/02/2003Lin WU, [email protected]
Why it’s so hard
No method touches the tough problem explicitly.– Key problems:
extremely huge searching space hard to evaluate positions
Present strategy is to reduce the searching space by improve evaluation function.
![Page 40: IGB GO —— A self-learning GO program](https://reader036.vdocuments.site/reader036/viewer/2022062410/5681598b550346895dc6d162/html5/thumbnails/40.jpg)
12/02/2003Lin WU, [email protected]
Why it’s so hard (cont.)
Reinforcement learning may not be enough– Nicol N. Schraudolph, 6 years without any observable progress– Arthur Samuel, “no progress has been made in overcoming [this
defect]” (11 years, 1956-1967) (Blondie24, p146-147) Neural network may not learn
– Why? Representation space is huge even for the last move 90% occupied, 9x9 board, equal number of black and white
– Solution Generalization ability Automatically identify features
3236
45
36
8110154.1
365.09.081
CC
![Page 41: IGB GO —— A self-learning GO program](https://reader036.vdocuments.site/reader036/viewer/2022062410/5681598b550346895dc6d162/html5/thumbnails/41.jpg)
12/02/2003Lin WU, [email protected]
Lesson I
Ability to improve The best?– The speed of improving:
5x5: 3 - 4 hours training to beat random
1 - 2 weeks (168-336 h) to be comparable to GNUGO 7x7: after 1 month of training, GNUGO is still able to win.
==?
![Page 42: IGB GO —— A self-learning GO program](https://reader036.vdocuments.site/reader036/viewer/2022062410/5681598b550346895dc6d162/html5/thumbnails/42.jpg)
12/02/2003Lin WU, [email protected]
Lesson II
Deterministic function between input and output
Neural network can learn it without any difficulty
No– The intrinsic complexity of the function– Neural network can only learn the correlation between the
input and the output, as a result of hill climbing
==?
![Page 43: IGB GO —— A self-learning GO program](https://reader036.vdocuments.site/reader036/viewer/2022062410/5681598b550346895dc6d162/html5/thumbnails/43.jpg)
12/02/2003Lin WU, [email protected]
Conclusion
A self-learning GO program is possible but exists several technically difficult problems– Automatic feature discovery– Automatic learning from failure– Position-level performance evaluation