game playing notes adapted from lecture notes for cmsc 421 by b.j. dorr

game playing

Notes adapted from lecture notes for CMSC 421 by B.J. Dorr

2

Outline• What are games?

• Optimal decisions in games– Which strategy leads to success?

• - pruning

• Games of imperfect information

• Games that include an element of chance

3

What are and why study games?• Games are a form of multi-agent environment

– What do other agents do and how do they affect our success?

– Cooperative vs. competitive multi-agent environments.– Competitive multi-agent environments give rise to

adversarial problems a.k.a. games

• Why study games?– Fun; historically entertaining– Interesting subject of study because they are hard– Easy to represent and agents restricted to small number

of actions

4

Relation of Games to Search• Search – no adversary

– Solution is (heuristic) method for finding goal– Heuristics and CSP techniques can find optimal solution– Evaluation function: estimate of cost from start to goal through

given node– Examples: path planning, scheduling activities

• Games – adversary– Solution is strategy (strategy specifies move for every

possible opponent reply).– Time limits force an approximate solution– Evaluation function: evaluate “goodness” of

game position

– Examples: chess, checkers, Othello, backgammon

5

Types of Games

6

Game setup• Two players: MAX and MIN• MAX moves first and they take turns until the game

is over. Winner gets award, looser gets penalty.• Games as search:

– Initial state: e.g. board configuration of chess– Successor function: list of (move,state) pairs specifying

legal moves.– Terminal test: Is the game finished?– Utility function: Gives numerical value of terminal states.

E.g. win (+1), lose (-1) and draw (0) in tic-tac-toe (next)

• MAX uses search tree to determine next move.

7

Partial Game Tree for Tic-Tac-Toe

8

Optimal strategies• Find the contingent strategy for MAX assuming an

infallible MIN opponent.• Assumption: Both players play optimally !!• Given a game tree, the optimal strategy can be

determined by using the minimax value of each node:

MINIMAX-VALUE(n)=UTILITY(n) If n is a terminal

maxs successors(n) MINIMAX-VALUE(s) If n is a max node

mins successors(n) MINIMAX-VALUE(s) If n is a min node

9

Two-Ply Game Tree

10

Two-Ply Game Tree• MAX nodes

• MIN nodes

• Terminal nodes – utility values for MAX

• Other nodes – minimax values

• MAX’s best move at root – a1 - leads to the successor with the highest minimax value

• MIN’s best to reply – b1 – leads to the successor with the lowest minimax value

11

What if MIN does not play optimally?

• Definition of optimal play for MAX assumes MIN plays optimally: maximizes worst-case outcome for MAX.

• But if MIN does not play optimally, MAX will do even better. [Can be proved.]

12

Minimax Algorithmfunction MINIMAX-DECISION(state) returns an action inputs: state, current state in game vMAX-VALUE(state) return the action in SUCCESSORS(state) with value v

function MIN-VALUE(state) returns a utility value if TERMINAL-TEST(state) then return UTILITY(state) v ∞ for a,s in SUCCESSORS(state) do v MIN(v,MAX-VALUE(s)) return v

function MAX-VALUE(state) returns a utility value if TERMINAL-TEST(state) then return UTILITY(state) v ∞ for a,s in SUCCESSORS(state) do v MAX(v,MIN-VALUE(s)) return v

13

Properties of Minimax

Criterion Minimax

Complete? Yes

Time O(bm)

Space O(bm)

Optimal? Yes

14

Tic-Tac-Toe• Depth bound 2

• Breadth first search until all nodes at level 2 are generated

• Apply evaluation function to positions at these nodes

15

Tic-Tac-Toe• Evaluation function e(p) of a position p

– If p is not a winning position for either player – e(p) = (number of a complete rows, columns, or

diagonals that are still open for MAX) – (number of a complete rows, columns, or diagonals that are still open for MIN)

– If p is a win for MAX• e(p) =

– If p is a win for MIN • e(p) = -

16

Tic-Tac-Toe - First stage

x

x

x

xo

xo

xo

xo

1

-1

-21

6-4=2

5-4=1

6-5=1

5-5=0

6-5=1

5-6=-1

5-5=0

5-6=-1

6-6=0

4-6=-2

4-5=-1

5-5=0

xo

xo

x

o

x

o

x o

x

o

xo

x o

MAX’s move

17

Tic-Tac-Toe - Second stagexo

0

1

1

0

3-2=1

5-2=3

3-2=1

4-2=2

3-2=1

4-2=23-3=0

5-3=2

3-3=0

4-3=1

4-3=1

4-3=14-2=2

5-2=3

3-2=1

4-2=2

4-2=2

4-2=2

4-3=1

3-3=0

4-3=1

xx

x

o x

1

x

o x x

o x x

o x x

ox

xox

xox

xox

xox

xox

xox

xxo

xxo

xxo

xxo

xxo

xxo

xxo

ox

xo

xxo

xxo

xxo

xxo

xxo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

MAX’s move

18

Multiplayer games• Games allow more than two players

• Single minimax values become vectors

19

Problem of minimax search• Number of games states is exponential to the

number of moves.– Solution: Do not examine every node– ==> Alpha-beta pruning

• Alpha = value of best choice found so far at any choice point along the MAX path

• Beta = value of best choice found so far at any choice point along the MIN path

• Revisit example …

20

Tic-Tac-Toe - First stage

x

x

xo

-1

6-5=1

5-5=0

6-5=1

5-6=-1

4-5=-1

5-5=0

xo

x o

x

o

xo

x o

Alpha value = -1

Beta value = -1

A

B

C

21

Alpha-beta pruning• Search progresses in a depth first manner • Whenever a tip node is generated, it’s static

evaluation is computed • Whenever a position can be given a backed up

value, this value is computed • Node A and all its successors have been generated • Backed up value -1• Node B and its successors have not yet been

generated • Now we know that the backed up value of the start

node is bounded from below by -1

22

Alpha-beta pruning• Depending on the back up values of the

other successors of the start node, the final backed up value of the start node may be greater than -1, but it cannot be less

• This lower bound – alpha value for the start nodea

23

Alpha-beta pruning• Depth first search proceeds – node B and its

first successor node C are generated

• Node C is given a static value of -1

• Backed up value of node B is bounded from above by -1

• This Upper bound on node B – beta value.

• Therefore discontinue search below node B. – Node B. will not turn out to be preferable to node

A

24

Alpha-beta pruning• Reduction in search effort achieved by

keeping track of bounds on backed up values

• As successors of a node are given backed up values, the bounds on backed up values can be revised – Alpha values of MAX nodes that can never

decrease – Beta values of MIN nodes can never increase

25

Alpha-beta pruning• Therefore search can be discontinued

– Below any MIN node having a beta value less than or equal to the alpha value of any of its MAX node ancestors

• The final backed up value of this MIN node can then be set to its beta value

– Below any MAX node having an alpha value greater than or equal to the beta value of any of its MINa node ancestors

• The final backed up value of this MAX node can then be set to its alpha value

26

Alpha-beta pruning• during search, alpha and beta values are

computed as follows:– The alpha value of a MAX node is set equal to

the current largest final backed up value of its successors

– The beta value of a MINof the node is set equal to the current smallest final backed up value of its successors

27

Alpha-Beta Example

[-∞, +∞]

[-∞,+∞]

Range of possible valuesDo DF-search until first leaf

28

Alpha-Beta Example (continued)

[-∞,3]

[-∞,+∞]

29


[-∞,3]

[-∞,+∞]

30


[3,+∞]

[3,3]

31


[-∞,2]

[3,+∞]

[3,3]

This node is worse for MAX

32


[-∞,2]

[3,14]

[3,3] [-∞,14]

,

33


[−∞,2]

[3,5]

[3,3] [-∞,5]

,

34


[2,2][−∞,2]

[3,3]

[3,3]

35


[2,2][-∞,2]

[3,3]

[3,3]

36

Alpha-Beta Algorithmfunction ALPHA-BETA-SEARCH(state) returns an action inputs: state, current state in game vMAX-VALUE(state, - ∞ , +∞) return the action in SUCCESSORS(state) with value v

function MAX-VALUE(state, , ) returns a utility value if TERMINAL-TEST(state) then return UTILITY(state) v - ∞ for a,s in SUCCESSORS(state) do v MAX(v,MIN-VALUE(s, , )) if v ≥ then return v MAX( ,v) return v

37

Alpha-Beta Algorithm

function MIN-VALUE(state, , ) returns a utility value if TERMINAL-TEST(state) then return UTILITY(state) v + ∞ for a,s in SUCCESSORS(state) do v MIN(v,MAX-VALUE(s, , )) if v ≤ then return v MIN( ,v) return v

38

General alpha-beta pruning• Consider a node n

somewhere in the tree• If player has a better

choice at– Parent node of n– Or any choice point further

up

• n will never be reached in actual play.

• Hence when enough is known about n, it can be pruned.

39

Final Comments about Alpha-Beta Pruning

• Pruning does not affect final results• Entire subtrees can be pruned.• Good move ordering improves effectiveness of

pruning• Alpha-beta pruning can look twice as far as

minimax in the same amount of time• Repeated states are again possible.

– Store them in memory = transposition table

40

Games that include chance

• Possible moves (5-10,5-11), (5-11,19-24),(5-10,10-16) and (5-11,11-16)

41


• Possible moves (5-10,5-11), (5-11,19-24),(5-10,10-16) and (5-11,11-16)

• [1,1], [6,6] chance 1/36, all other chance 1/18

chance nodes

42


• [1,1], [6,6] chance 1/36, all other chance 1/18 • Can not calculate definite minimax value, only expected

value

43

Expected minimax valueEXPECTED-MINIMAX-VALUE(n)=

UTILITY(n) If n is a terminal

maxs successors(n) MINIMAX-VALUE(s) If n is a max node

mins successors(n) MINIMAX-VALUE(s) If n is a max node

s successors(n) P(s) . EXPECTEDMINIMAX(s) If n is a chance node

These equations can be backed-up recursively all the way to the root of the game tree.

44

Position evaluation with chance nodes

• Left, A1 wins• Right A2 wins• Outcome of evaluation function may not change when

values are scaled differently.• Behavior is preserved only by a positive linear

transformation of EVAL.

45

Discussion• Examine section on state-of-the-art games yourself• Minimax assumes right tree is better than left, yet

…– Return probability distribution over possible values– Yet expensive calculation

46

Discussion• Utility of node expansion

– Only expand those nodes which lead to significanlty better moves

• Both suggestions require meta-reasoning

47

Summary• Games are fun (and dangerous)

• They illustrate several important points about AI– Perfection is unattainable -> approximation– Good idea what to think about– Uncertainty constrains the assignment of values

to states

• Games are to AI as grand prix racing is to automobile design.

game playing notes adapted from lecture notes for cmsc 421 by b.j. dorr

Documents