learning agents laboratory computer science department george mason university
DESCRIPTION
CS 782 Machine Learning. 2. Rote Learning. Prof. Gheorghe Tecuci. Learning Agents Laboratory Computer Science Department George Mason University. Overview. Rote learning issues. Game playing as a performance task. Rote learning in game paying. Learning a static evaluation function. - PowerPoint PPT PresentationTRANSCRIPT
1 2003, G.Tecuci, Learning Agents Laboratory
Learning Agents LaboratoryComputer Science Department
George Mason University
Prof. Gheorghe Tecuci
2. Rote Learning2. Rote Learning
2 2003, G.Tecuci, Learning Agents Laboratory
OverviewOverview
Game playing as a performance task
Rote learning in game paying
Learning a static evaluation function
Rote learning issues
Recommended reading
3 2003, G.Tecuci, Learning Agents Laboratory
Rote LearningRote Learning
(X1, ... , Xn) [(X1, ... , Xn), (Y1, ... , Yp)]
Inputpattern
Performancefunction Output value
of computationAssociated
pair
storef(Y1, ... , Yp)
Rote learning consists of memorizing the solutions of the solved problems so that the system needs not to solve them again:
During subsequent computations of f(X1, ... Xn), the performance element can simply retrieve (Y1, ... , Yp) from memory rather than recomputing it.
4 2003, G.Tecuci, Learning Agents Laboratory
Issues in the design of rote learning systemsIssues in the design of rote learning systems
Rote learning requires useful organization of the memory so that the retrieval of the desired information will be very fast.
The information stored at one time should still be valid later.
The cost of storing and retrieving the memorized information should be smaller than the cost of recomputing it.
Memory organizations
Stability of the environment
Store-versus-compute trade-off
5 2003, G.Tecuci, Learning Agents Laboratory
OverviewOverview
Game playing as a performance task
Rote learning in game paying
Learning a static evaluation function
Rote learning issues
Recommended reading
6 2003, G.Tecuci, Learning Agents Laboratory
1 2 3 4
5 6 7 8
9 10 1211
13 161514
21
20191817
22 2423
25 2726 28
29 3130 32
Game playing as a performance task: CheckersGame playing as a performance task: Checkers
There are two players (Grey and White), each having 12 men. They alternatively move one of their men. A man could be moved forward diagonally from one black square to another, or it could jump over an opponent's man, if the square behind it is vacant. In such a case the opponent's man is captured. Any number of men could be jumped (and captured) if the square behind each is vacant. If a man reaches the opponent's last row, it is transformed into a king by placing another man on top of it. The king could move both forward and backward (as opposed to the men which could move only forward).The winning player is the one who succeeds in blocking all the men of its opponent (so that they cannot move) or succeeds in capturing all of them.
7 2003, G.Tecuci, Learning Agents Laboratory
1 2 3 4
5 6 7 8
9 10 1211
13 161514
21
20191817
22 2423
25 2726 28
29 3130 32
Each path from the root node to a terminal node gives a different complete play of the game. For instance, Grey has seven possible moves at the start of the game, namely:
9-13, 9-14, 10-14, 10-15, 11-15, 11-16, and 12-16. White has seven possible responses:
21-17, 22-17, 22-18, 23-18, 23-19; 24-19, 24-20.
Some of these responses are better, while others are worse. For instance, if Grey opens 9-14 and White plays 21-17, then Grey can jump over White's man and capture it.
Game tree searchGame tree searchAll the possible plays of a game could be represented as a tree. The root node is the initial state, in which it is the first player's turn to move.
The successors of the initial state are the states he can reach in one move, their successors are the states resulting from the other player's possible replies, and so on.
Terminal states are those representing a win for the Grey player, a loss for the the Grey player, or a draw.
8 2003, G.Tecuci, Learning Agents Laboratory
The minimax procedureThe minimax procedure
Minimax is a procedure for assigning values to the nodes in a game tree. The value of a node expresses how good that node is for the first player (called the Max player) and how bad it is for the second player (called the Min player). Therefore, the Max player will always choose to move to the node that has the maximum value among the possible successors of the current node. Similarly, the Min player will always choose to move to the node that has the minimum value among the possible successors of the current node.
In the case of checkers, we consider that Grey is the Max player and White is the Min player.
Given the values of the terminal nodes, the values of the nonterminal nodes are computed as follows:- the value of a node where it is the Grey player's turn to move is the maximum of the values of its successors (because Grey tries to maximize its outcome);- the value of a node where it is the White player's turn to move is the minimum of the values of its successors (because White tries to minimize the outcome of Grey).
9 2003, G.Tecuci, Learning Agents Laboratory
ProblemProblem
Consider the following game tree in which the numbers associated with the leaves represent how good they are from the point of view of the Maximizing player:
What move should be chosen by the Max player, and what should be the response of the Min player, assuming that both are using the mini-max procedure?
3 75 3 41 5
Min b
d e f
h i j k l m n29 7
g
o p r
c
Max a
Max
10 2003, G.Tecuci, Learning Agents Laboratory
SolutionSolution
Max will move to c,Min will respond by moving to f, and Max will move to m.
3 75 3 41 5
Min b
d e f
h i j k l m n29 7
g
o p r
c
Max a
5 4
4
7 9
7
7
Max
11 2003, G.Tecuci, Learning Agents Laboratory
A complete game tree for checkers has been estimated as having 1040 nonterminal nodes. If one assumes that these nodes could be generated at a rate of 3 billion per second, the generation of the whole tree would still require around 1021 centuries !
Checkers is far simpler than chess which, in turn, is generally far simpler than business competitions or military games.
Size of the search space
Searching a partial game treeSearching a partial game tree
The tree of possibilities is far too large to be fully generated andsearched backward from the terminal nodes, for an optimal move.
12 2003, G.Tecuci, Learning Agents Laboratory
.
node corresponding tothe current board situation
2. Estimate the values of the leaf nodes by using a static evaluation function
1. G
ener
ate
a pa
rtia
l gam
e tr
ee
3. Back propagate the estim
ated values
Heuristic function for board position evaluation: w1.f1 + w2
.f2 + w3.f3 + …
where wi are real-valued weights and fi are numeric board features (e.g. the number of white pieces, the number of white kings).
Searching a partial game treeSearching a partial game tree
13 2003, G.Tecuci, Learning Agents Laboratory
.
node corresponding tothe current board situation
2. Estimate the values of the leaf nodes by using a static evaluation function
1. G
ener
ate
a pa
rtia
l gam
e tr
ee3. B
ack propagate the estimated values
What is the justification for this approach?What is the justification for this approach?
The idea is that the static evaluation function produces more accurate results when the evaluated nodes are closer to a goal node.
14 2003, G.Tecuci, Learning Agents Laboratory
OverviewOverview
Game playing as a performance task
Rote learning in game paying
Learning a static evaluation function
Rote learning issues
Recommended reading
15 2003, G.Tecuci, Learning Agents Laboratory
An illustration of rote learning in game playingAn illustration of rote learning in game playingSamuel's checkers player
A
B
C
D
8
3 8 5
3 3 12 8 17 5 6
1 3 4 2 10 12 4 8 6 17 2 3 5 2 6
Memorize (A, 8)Estimate value of A
17 2003, G.Tecuci, Learning Agents Laboratory
Improving the performance of the checkers playerImproving the performance of the checkers player
QuestionUsing the memorized value (A, 8) is improving the performance.Why?
Current position E
A(A, 8)
18 2003, G.Tecuci, Learning Agents Laboratory
Improving the look-ahead power by rote learningImproving the look-ahead power by rote learning
E
A
D
Current position
(A, 8)
8
Answer: This makes the program more efficient for two reasons: • it does not have to compute the value of
A with the static evaluation function; • the memorized value of A is more accurate than the static value of A, because it is based on a look-ahead search.
20 2003, G.Tecuci, Learning Agents Laboratory
The program developed by Samuel was trained by playing against itself, by playing against people and by following book games. After training, the memory contained roughly 53,000 positions, and the program became "rather better-than-average novice, but definitely not ... an expert" (Samuel, 1959).
Samuel estimated that his program would need to memorize about one million positions to approximate a master level of checkers play.
Samuel's experiments demonstrated that significant and measurable learning can result from rote learning alone.
By retrieving the stored results of extensive computations, the program can proceed deeper in its reasoning. The price is storage space, access time, and effort in organizing the stored knowledge.
Samuel’s results and conclusionSamuel’s results and conclusion
21 2003, G.Tecuci, Learning Agents Laboratory
OverviewOverview
Game playing as a performance task
Rote learning in game paying
Learning a static evaluation function
Rote learning issues
Recommended reading
22 2003, G.Tecuci, Learning Agents Laboratory
value = wifi
Learning a polynomial evaluation functionLearning a polynomial evaluation function
Learning a polynomial evaluation function
What are the main problems to be solved?
a) Discovering which features fi to use in the function
b) Learning the weights of the features to obtain an accurate value for the board position
23 2003, G.Tecuci, Learning Agents Laboratory
The learning procedure is to compare at each move the value of the static evaluation function corresponding to the current board position with a performance standard that provides a more accurate estimate of that value. The difference between these two estimates controls the adjustment of the weights in the evaluation function so as to better approximate the performance standard.
Learning the weights of the featuresLearning the weights of the features
Reinforcement learning
24 2003, G.Tecuci, Learning Agents Laboratory
Performance standardsPerformance standards
What performance standards could be used?
One performance standard could be obtained by conducting a deeper minimax search into future board positions, applying the evaluation function to tip board positions and backing up these values. The idea is that the static evaluation function produces more accurate results when the evaluated nodes are closer to a goal node.
25 2003, G.Tecuci, Learning Agents Laboratory
Performance standards: using “f” itselfPerformance standards: using “f” itself
How could this be implemented?
One considers an iterative procedure of updating “f.”The performance standard for a certain position B is f(successor(B)).That is, one adjusts the weights such that to reduce the difference between f(successor(B)) and f(B).
B f(B)
f(successor(B))
26 2003, G.Tecuci, Learning Agents Laboratory
Another possible performance standard could be obtained from "book games" played between two human experts. In such a case, the static evaluation function should be modified so that the value of the board position corresponding to the move indicated by the book is higher than the values of the positions corresponding to the other possible moves.
Performance standardsPerformance standards
What other performance standards could be used?
27 2003, G.Tecuci, Learning Agents Laboratory
The problem of new terms: How could a learning system discover the appropriate terms for representing the knowledge to be learned?
Discovering features to use in evaluation functionDiscovering features to use in evaluation function
A partial solution is term selection: provide a list of terms from which the most relevant terms are to be chosen.
Samuel started with 38 terms, out of which only 16 are used in the static evaluation function. The remaining 22 features are maintained on a standby feature list. Periodically, the feature that has the lowest weight out of the 16 features currently in use in the evaluation function is replaced with the first feature from the standby 22 feature list. The replaced feature is placed at the end of the standby 22 feature list.
28 2003, G.Tecuci, Learning Agents Laboratory
Because such a table may be very large, one may reduce it by considering only special combinations of argument values.Learning the signature table means determining the values of the function for particular combinations of the arguments.The signature table is a more general representation than a linear polynomial function.
Other types of static evaluation functionsOther types of static evaluation functions
The inputs are the features and the output is the value of the function.
Signature table (an explicit representation of a function which gives the value of the function for each possible combination of argument values).
Neural network
29 2003, G.Tecuci, Learning Agents Laboratory
Results of Samuel’s experimentsResults of Samuel’s experiments
•Learning based on signature tables was much more efficient than learning based on a linear polynomial function.
•Learning a signature table from book moves was more efficient than rote learning.
30 2003, G.Tecuci, Learning Agents Laboratory
Recommended readingRecommended reading
Mitchell T.M., Machine Learning, Chapter 1: Introduction, pp. 5-14, McGraw Hill, 1997.
Samuel A.L., Some studies in machine learning using the game of checkers, in Readings in Machine Learning, pp.535-554.
The Handbook of Artificial Intelligence, vol. III, pp. 335-344, pp.457-464.