chapter5ce.sharif.edu › courses › 96-97 › 1 › ce417-2 › resources › ... · chapter5 2....
TRANSCRIPT
Game playing
Chapter 5
Chapter 5 1
Outline
♦ Games
♦ Perfect play– minimax decisions– α–β pruning
♦ Resource limits and approximate evaluation
♦ Games of chance
♦ Games of imperfect information
Chapter 5 2
Games
Reminder: Multi-agent environment is an environment in which each agentneeds to consider the actions of other agents and how they affect its ownwelfare.
In AI, the most common games are of a rather specialized kind: deterministic,turn-taking, two-player, zero-sum games of perfect information.For example, if one player wins a game of chess, the other player necessarilyloses.
Why games? The state of a game is easy to represent, and agents areusually restricted to a small number of actions whose outcomes are definedby precise rules.
With the exception of robot soccer, physical games have not attractedmuch interest in the AI community.
Chapter 5 3
Games formulation
We first consider games with two players, whom we call MAX and MIN.MAX moves first, and then they take turns moving until the game is over.
A game can be formally defined as a kind of search problem with the followingelements:
♦ S0 : The initial state, which specifies how the game is set up at the start.
♦ PLAY ER(s): Defines which player has the move in a state.
♦ ACTIONS(s): Returns the set of legal moves in a state.
♦ RESULT (s, a): The transition model, which defines the result of amove.
Chapter 5 4
Games formulation
♦ TERMINAL − TEST (s): A terminal test, which is true when thegame is over and false otherwise. States where the game has ended arecalled terminal states.
♦ UTILITY (s, p): A utility function (also called an objective function),defines the final numeric value for a game that ends in terminal state s for aplayer p.→ In chess, the outcome is a win, loss, or draw, with values +1, 0, or12. Zero-sum game?? Constant-sum would have been a better term, butzero-sum is traditional.
Chapter 5 5
Game tree
A tree where the nodes are game states and the edges are moves.
XX
XX
X
X
X
XX
X X
O
OX O
O
X OX O
X
. . . . . . . . . . . .
. . .
. . .
. . .
XX
–1 0 +1
XX
X XO
X XOX XO
O
O
X
X XO
OO
O O X X
MAX (X)
MIN (O)
MAX (X)
MIN (O)
TERMINAL
Utility
Chapter 5 6
Types of games
deterministic chance
perfect information
imperfect information
chess, checkers,go, othello
backgammonmonopoly
bridge, pokerbattleships,blind tictactoe
Chapter 5 7
Minimax
Perfect play for deterministic, perfect-information games
Idea: choose move to position with highest minimax value= best achievable payoff against best play
E.g., 2-ply game:MAX
3 12 8 642 14 5 2
MIN
3
A1
A3
A2
A13A
12A
11A
21 A23
A22
A33A
32A
31
3 2 2
Chapter 5 8
Minimax algorithm
function Minimax-Decision(state) returns an action
inputs: state, current state in game
return the a in Actions(state) maximizing Min-Value(Result(a, state))
function Max-Value(state) returns a utility value
if Terminal-Test(state) then return Utility(state)
v←−∞
for a, s in Successors(state) do v←Max(v, Min-Value(s))
return v
function Min-Value(state) returns a utility value
if Terminal-Test(state) then return Utility(state)
v←∞
for a, s in Successors(state) do v←Min(v, Max-Value(s))
return v
Chapter 5 9
Properties of minimax
Complete?? Yes, if tree is finite (chess has specific rules for this)
Optimal?? Yes, against an optimal opponent. Otherwise??
Time complexity?? O(bm)
Space complexity?? O(bm) (depth-first exploration)
For chess, b ≈ 35, m ≈ 100 for “reasonable” games⇒ exact solution completely infeasible
But do we need to explore every path?
Chapter 5 10
Optimal decisions in multiplayer games
Many popular games allow more than two players. How to extend the con-cepts of MiniMax algorithm to those?
The single value for each node is replaced with a vector of values.For example, in a three-player game with players A, B, and C, a vector< vA, vB, vC > is associated with each node.
For terminal states: Design a utility function that returns a vector ofvalues.
For non-terminal states: How to compute the value of each parent nodefrom the values of its child?
Chapter 5 11
Optimal decisions in multiplayer games
to moveA
B
C
A
(1, 2, 6) (4, 2, 3) (6, 1, 2) (7, 4,1) (5,1,1) (1, 5, 2) (7, 7,1) (5, 4, 5)
(1, 2, 6) (6, 1, 2) (1, 5, 2) (5, 4, 5)
(1, 2, 6) (1, 5, 2)
(1, 2, 6)
X
Chapter 5 12
α–β pruning
The problem with minimax search is that the number of game states it hasto examine is exponential in the depth of the tree.
α–β pruning technique can effectively cut it in half. It cant eliminate theexponent.
Chapter 5 13
α–β pruning example
MAX
3 12 8
MIN 3
3
Chapter 5 14
α–β pruning example
MAX
3 12 8
MIN 3
2
2
X X
3
Chapter 5 15
α–β pruning example
MAX
3 12 8
MIN 3
2
2
X X14
14
3
Chapter 5 16
α–β pruning example
MAX
3 12 8
MIN 3
2
2
X X14
14
5
5
3
Chapter 5 17
α–β pruning example
MAX
3 12 8
MIN
3
3
2
2
X X14
14
5
5
2
2
3
Chapter 5 18
The α–β algorithm
Chapter 5 19
α–β pruning
..
..
..
MAX
MIN
MAX
MIN V
α is the best value (to max) found so far off the current path
If V is worse than α, max will avoid it ⇒ prune that branch
Define β similarly for min
Chapter 5 20
Properties of α–β
The effectiveness of alphabeta pruning is highly dependent on the order inwhich the states are examined.
MAX
3 12 8
MIN
3
3
2
2
X X14
14
5
5
2
2
3
This suggests that it might be worthwhile to try to examine first the succes-sors that are likely to be best (Obviously, it cannot be done.)
With “perfect ordering,” time complexity = O(bm/2)⇒ doubles solvable depth
Chapter 5 21
Resource limits
Standard approach:
• Use Cutoff-Test instead of Terminal-Test
e.g., depth limit
• Use Eval instead of Utility
i.e., evaluation function that estimates desirability of position
Suppose we have 100 seconds, explore 104 nodes/second⇒ 106 nodes per move ≈ 358/2
⇒ α–β reaches depth 8 ⇒ pretty good chess program
Chapter 5 22
Evaluation functions
An evaluation function returns an estimate of the expected utility of thegame from a given position.
The performance of a game-playing program depends strongly on the qualityof its evaluation function.
What is the properties of a good evaluation function?(1) The evaluation function should order the terminal states in the same wayas the true utility function.(2) The computation must not take too long!(3) For non-terminal states, the evaluation function should be strongly cor-related with the actual chances of winning.
Chapter 5 23
Evaluation functions
Black to move
White slightly better
White to move
Black winning
For chess, typically linear weighted sum of features
Eval(s) = w1f1(s) + w2f2(s) + . . . + wnfn(s)
e.g., w1 = 9 withf1(s) = (number of white queens) – (number of black queens), etc.
Chapter 5 24
Cutting off search
(b) White to move(a) White to move
The evaluation function should be applied only to positions that are quies-centthat is, unlikely to exhibit wild swings in value in the near future.
Non-quiescent positions can be expanded further until quiescent positionsare reached. This extra search is called a quiescence search.
Chapter 5 25
Some other techniques to improve performance
♦ Using transposition table: It is worthwhile to store the evaluation ofthe resulting position in a hash table the first time it is encountered so thatwe dont have to recompute it on subsequent occurrences.
♦ Forward pruning: On each turn, consider only a beam of the n bestmoves (according to the evaluation function) rather than considering allpossible moves.
♦ Table lookup: Specifically for the opening and ending of games. Usetable look up at the first and the switch to search to continue. Near the endof the game there are again fewer possible positions, and thus more chanceto do lookup.In 2016, Bourzutschky solved all pawn-less six-piece. there is a KQNKRBNendgame that with best play requires 517 moves until a capture, which thenleads to a mate.
Chapter 5 26
Digression: Exact values don’t matter
MIN
MAX
21
1
42
2
20
1
1 40020
20
Behaviour is preserved under any monotonic transformation of Eval
Only the order matters...
Chapter 5 27
Nondeterministic games: backgammon
1 2 3 4 5 6 7 8 9 10 11 12
24 23 22 21 20 19 18 17 16 15 14 13
0
25
Chapter 5 28
Nondeterministic games in general
In nondeterministic games, chance introduced by dice, card-shuffling
Simplified example with coin-flipping:
MIN
MAX
2
CHANCE
4 7 4 6 0 5 −2
2 4 0 −2
0.5 0.5 0.5 0.5
3 −1
Chapter 5 29
Algorithm for nondeterministic games
Expectiminimax gives perfect play
Just like Minimax, except we must also handle chance nodes:
. . .if state is a Max node then
return the highestExpectiMinimax-Value of Successors(state)if state is a Min node then
return the lowestExpectiMinimax-Value of Successors(state)if state is a chance node then
return average ofExpectiMinimax-Value of Successors(state). . .
Chapter 5 30
Nondeterministic games in practice
Time complexity: O(bmnm), where n is the number of distinct rolls.
Dice rolls increase b: 21 possible rolls with 2 diceBackgammon ≈ 20 legal moves (can be 6,000 with 1-1 roll)
depth 4 = 20× (21× 20)3 ≈ 1.2× 109
As depth increases, probability of reaching a given node shrinks⇒ value of lookahead is diminished
α–β pruning is much less effective
TDGammon uses depth-2 search + very good Eval
≈ world-champion level
Chapter 5 31
Digression: Exact values DO matter
DICE
MIN
MAX
2 2 3 3 1 1 4 4
2 3 1 4
.9 .1 .9 .1
2.1 1.3
20 20 30 30 1 1 400 400
20 30 1 400
.9 .1 .9 .1
21 40.9
Behaviour is preserved only by positive linear transformation of Eval
Chapter 5 32
Pruning in nondeterministic game trees
A version of α-β pruning is possible:
0.5 0.5
[ − , + ]
[ − , + ]
[ − , + ]
0.5 0.5
[ − , + ]
[ − , + ]
[ − , + ]
Chapter 5 33
Pruning in nondeterministic game trees
A version of α-β pruning is possible:
2
[ − , 2 ]
0.5 0.5
[ − , + ]
[ − , + ]
[ − , + ]
0.5 0.5
[ − , + ]
[ − , + ]
Chapter 5 34
Pruning in nondeterministic game trees
A version of α-β pruning is possible:
0.5 0.5
[ − , + ]
[ − , + ]
[ − , + ]
0.5 0.5
[ − , + ]
[ − , + ]
2 2
[ 2 , 2 ]
Chapter 5 35
Pruning in nondeterministic game trees
A version of α-β pruning is possible:
[ − , 2 ]
2
[ − , 2 ]
0.5 0.5
[ − , + ]
[ − , + ]
[ − , + ]
0.5 0.5
2 2
[ 2 , 2 ]
Chapter 5 36
Pruning in nondeterministic game trees
A version of α-β pruning is possible:
2
0.5 0.5
[ − , + ]
[ − , + ]
[ − , + ]
0.5 0.5
2 2
[ 2 , 2 ]
1
[ 1 , 1 ]
[ 1.5 , 1.5 ]
Chapter 5 37
Pruning in nondeterministic game trees
A version of α-β pruning is possible:
[ − , 0 ]
2
0.5 0.5
[ − , + ]
[ − , + ]
0.5 0.5
2 2
[ 2 , 2 ]
1
[ 1 , 1 ]
[ 1.5 , 1.5 ]
0
Chapter 5 38
Pruning in nondeterministic game trees
A version of α-β pruning is possible:
2
0.5 0.5
[ − , + ]
[ − , + ]
0.5 0.5
2 2
[ 2 , 2 ]
1
[ 1 , 1 ]
[ 1.5 , 1.5 ]
0 1
[ 0 , 0 ]
Chapter 5 39
Pruning in nondeterministic game trees
A version of α-β pruning is possible:
[ − , 0.5 ]
[ − , 1 ]
2
0.5 0.50.5 0.5
2 2
[ 2 , 2 ]
1
[ 1 , 1 ]
[ 1.5 , 1.5 ]
0 1
[ 0 , 0 ]
1
Chapter 5 40
Pruning contd.
More pruning occurs if we can bound the leaf values
0.5 0.50.5 0.5
[ −2 , 2 ] [ −2 , 2 ]
[ −2 , 2 ][ −2 , 2 ][ −2 , 2 ][ −2 , 2 ]
Chapter 5 41
Pruning contd.
More pruning occurs if we can bound the leaf values
0.5 0.50.5 0.5
[ −2 , 2 ] [ −2 , 2 ]
[ −2 , 2 ][ −2 , 2 ][ −2 , 2 ][ −2 , 2 ]
2
Chapter 5 42
Pruning contd.
More pruning occurs if we can bound the leaf values
0.5 0.50.5 0.5
[ −2 , 2 ]
[ −2 , 2 ][ −2 , 2 ][ −2 , 2 ]
2 2
[ 2 , 2 ]
[ 0 , 2 ]
Chapter 5 43
Pruning contd.
More pruning occurs if we can bound the leaf values
0.5 0.50.5 0.5
[ −2 , 2 ]
[ −2 , 2 ][ −2 , 2 ][ −2 , 2 ]
2 2
[ 2 , 2 ]
[ 0 , 2 ]
2
Chapter 5 44
Pruning contd.
More pruning occurs if we can bound the leaf values
0.5 0.50.5 0.5
[ −2 , 2 ]
[ −2 , 2 ][ −2 , 2 ]
2 2
[ 2 , 2 ]
2 1
[ 1 , 1 ]
[ 1.5 , 1.5 ]
Chapter 5 45
Pruning contd.
More pruning occurs if we can bound the leaf values
0.5 0.50.5 0.5
[ −2 , 2 ]
2 2
[ 2 , 2 ]
2 1
[ 1 , 1 ]
[ 1.5 , 1.5 ]
0
[ −2 , 0 ]
[ −2 , 1 ]
Chapter 5 46
Partially observable games
Kriegspiel: A partially observable variant of chess in which pieces can movebut are completely invisible to the opponent.
White and Black each see a board containing only their own pieces.
A referee, who can see all the pieces, adjudicates the game and periodicallymakes announcements that are heard by both players (legal/illegal moves,captures, mate in one direction, checkmate)
Chapter 5 47
Partially observable games
Recall belief state: the set of all logically possible board states given thecomplete history of percepts to date.
A winning strategy, or guaranteed checkmate, is one that, for each pos-sible percept sequence, leads to an actual checkmate for every possible boardstate in the current belief state, regardless of how the opponent moves.
If a guaranteed checkmate found, the opponent will lose even if he/she cansee all the pieces.
Chapter 5 48
Example of guaranteed checkmate
Chapter 5 49
Probabilistic checkmate
Such checkmates are still required to work in every board state in the beliefstate; they are probabilistic with respect to randomization of the winningplayers moves.
To get the basic idea, consider the problem of finding a lone black kingusing just the white king. Simply by moving randomly, the white king willeventually bump into the black king even if the latter tries to avoid this fate,since Black cannot keep guessing the right evasive moves indefinitely. In theterminology of probability theory, detection occurs with probability 1.
Example: The KBNK endgame
What about The KBBK endgame?
Chapter 5 50
Summary
Games are fun to work on! (and dangerous)
They illustrate several important points about AI
♦ perfection is unattainable ⇒ must approximate
♦ good idea to think about what to think about
♦ uncertainty constrains the assignment of values to states
♦ optimal decisions depend on information state, not real state
Games are to AI as grand prix racing is to automobile design
Chapter 5 51