artificial intelligence 1: game playingdi.ulb.ac.be/.../ai-course_files/games_2up.pdf4 november 2,...

23
1 Artificial Intelligence 1: game playing Lecturer: Tom Lenaerts Institut de Recherches Interdisciplinaires et de Développements en Intelligence Artificielle (IRIDIA) Université Libre de Bruxelles TLo (IRIDIA) 2 November 2, 2004 Outline What are games? Optimal decisions in games Which strategy leads to success? α-β pruning Games of imperfect information Games that include an element of chance

Upload: others

Post on 19-Jan-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Artificial Intelligence 1: game playingdi.ulb.ac.be/.../AI-course_files/games_2up.pdf4 November 2, 2004 TLo (IRIDIA) 7 Partial Game Tree for Tic-Tac-Toe November 2, 2004 TLo (IRIDIA)

1

ArtificialIntelligence 1:game playing

Lecturer: Tom LenaertsInstitut de Recherches Interdisciplinaires et deDéveloppements en Intelligence Artificielle(IRIDIA)

Université Libre de Bruxelles

TLo (IRIDIA) 2November 2, 2004

Outline

What are games?

Optimal decisions in games Which strategy leads to success?

α-β pruning

Games of imperfect information

Games that include an element of chance

Page 2: Artificial Intelligence 1: game playingdi.ulb.ac.be/.../AI-course_files/games_2up.pdf4 November 2, 2004 TLo (IRIDIA) 7 Partial Game Tree for Tic-Tac-Toe November 2, 2004 TLo (IRIDIA)

2

TLo (IRIDIA) 3November 2, 2004

What are and why studygames? Games are a form of multi-agent environment

What do other agents do and how do they affect oursuccess?

Cooperative vs. competitive multi-agent environments.

Competitive multi-agent environments give rise toadversarial problems a.k.a. games

Why study games? Fun; historically entertaining

Interesting subject of study because they are hard

Easy to represent and agents restricted to smallnumber of actions

TLo (IRIDIA) 4November 2, 2004

Relation of Games to Search Search – no adversary

Solution is (heuristic) method for finding goal

Heuristics and CSP techniques can find optimal solution

Evaluation function: estimate of cost from start to goal

through given node

Examples: path planning, scheduling activities

Games – adversary Solution is strategy (strategy specifies move for every

possible opponent reply).

Time limits force an approximate solution

Evaluation function: evaluate “goodness” of

game position

Examples: chess, checkers, Othello, backgammon

Page 3: Artificial Intelligence 1: game playingdi.ulb.ac.be/.../AI-course_files/games_2up.pdf4 November 2, 2004 TLo (IRIDIA) 7 Partial Game Tree for Tic-Tac-Toe November 2, 2004 TLo (IRIDIA)

3

TLo (IRIDIA) 5November 2, 2004

Types of Games

TLo (IRIDIA) 6November 2, 2004

Game setup

Two players: MAX and MIN MAX moves first and they take turns until the game is

over. Winner gets award, looser gets penalty. Games as search:

Initial state: e.g. board configuration of chess

Successor function: list of (move,state) pairs specifyinglegal moves.

Terminal test: Is the game finished?

Utility function: Gives numerical value of terminal states.E.g. win (+1), loose (-1) and draw (0) in tic-tac-toe (next)

MAX uses search tree to determine next move.

Page 4: Artificial Intelligence 1: game playingdi.ulb.ac.be/.../AI-course_files/games_2up.pdf4 November 2, 2004 TLo (IRIDIA) 7 Partial Game Tree for Tic-Tac-Toe November 2, 2004 TLo (IRIDIA)

4

TLo (IRIDIA) 7November 2, 2004

Partial Game Tree for Tic-Tac-Toe

TLo (IRIDIA) 8November 2, 2004

Optimal strategies

Find the contingent strategy for MAX assuming aninfallible MIN opponent.

Assumption: Both players play optimally !! Given a game tree, the optimal strategy can be determined

by using the minimax value of each node:

MINIMAX-VALUE(n)=UTILITY(n) If n is a terminalmaxs ∈ successors(n) MINIMAX-VALUE(s) If n is a max nodemins ∈ successors(n) MINIMAX-VALUE(s) If n is a max node

Page 5: Artificial Intelligence 1: game playingdi.ulb.ac.be/.../AI-course_files/games_2up.pdf4 November 2, 2004 TLo (IRIDIA) 7 Partial Game Tree for Tic-Tac-Toe November 2, 2004 TLo (IRIDIA)

5

TLo (IRIDIA) 9November 2, 2004

Two-Ply Game Tree

TLo (IRIDIA) 10November 2, 2004

Two-Ply Game Tree

Page 6: Artificial Intelligence 1: game playingdi.ulb.ac.be/.../AI-course_files/games_2up.pdf4 November 2, 2004 TLo (IRIDIA) 7 Partial Game Tree for Tic-Tac-Toe November 2, 2004 TLo (IRIDIA)

6

TLo (IRIDIA) 11November 2, 2004

Two-Ply Game Tree

TLo (IRIDIA) 12November 2, 2004

Two-Ply Game Tree

The minimax decision

Minimax maximizes the worst-case outcome for max.

Page 7: Artificial Intelligence 1: game playingdi.ulb.ac.be/.../AI-course_files/games_2up.pdf4 November 2, 2004 TLo (IRIDIA) 7 Partial Game Tree for Tic-Tac-Toe November 2, 2004 TLo (IRIDIA)

7

TLo (IRIDIA) 13November 2, 2004

What if MIN does not playoptimally?

Definition of optimal play for MAX assumesMIN plays optimally: maximizes worst-caseoutcome for MAX.

But if MIN does not play optimally, MAX will doeven better. [Can be proved.]

TLo (IRIDIA) 14November 2, 2004

Minimax Algorithmfunction MINIMAX-DECISION(state) returns an action inputs: state, current state in game v←MAX-VALUE(state) return the action in SUCCESSORS(state) with value v

function MIN-VALUE(state) returns a utility value if TERMINAL-TEST(state) then return UTILITY(state) v ← ∞ for a,s in SUCCESSORS(state) do v ← MIN(v,MAX-VALUE(s)) return v

function MAX-VALUE(state) returns a utility value if TERMINAL-TEST(state) then return UTILITY(state) v ← ∞ for a,s in SUCCESSORS(state) do v ← MAX(v,MIN-VALUE(s)) return v

Page 8: Artificial Intelligence 1: game playingdi.ulb.ac.be/.../AI-course_files/games_2up.pdf4 November 2, 2004 TLo (IRIDIA) 7 Partial Game Tree for Tic-Tac-Toe November 2, 2004 TLo (IRIDIA)

8

TLo (IRIDIA) 15November 2, 2004

Properties of Minimax

YesOptimal?

O(bm)Space

O(bm)Time

YesComplete?

MinimaxCriterion

TLo (IRIDIA) 16November 2, 2004

Multiplayer games

Games allow more than two players

Single minimax values becomes vector

Page 9: Artificial Intelligence 1: game playingdi.ulb.ac.be/.../AI-course_files/games_2up.pdf4 November 2, 2004 TLo (IRIDIA) 7 Partial Game Tree for Tic-Tac-Toe November 2, 2004 TLo (IRIDIA)

9

TLo (IRIDIA) 17November 2, 2004

Problem of minimax search

Number of games states is exponential to the number ofmoves.

Solution: Do not examine very node

==> Alpha-beta pruning

Alpha = value of best choice found so far at any choice point alongthe MAX path

Beta = value of best choice found so far at any choice point alongthe MIN path

Revisit example …

TLo (IRIDIA) 18November 2, 2004

Alpha-Beta Example

[-∞, +∞]

[-∞,+∞]

Range of possible valuesDo DF-search until first leaf

Page 10: Artificial Intelligence 1: game playingdi.ulb.ac.be/.../AI-course_files/games_2up.pdf4 November 2, 2004 TLo (IRIDIA) 7 Partial Game Tree for Tic-Tac-Toe November 2, 2004 TLo (IRIDIA)

10

TLo (IRIDIA) 19November 2, 2004

Alpha-Beta Example(continued)

[-∞,3]

[-∞,+∞]

TLo (IRIDIA) 20November 2, 2004

Alpha-Beta Example(continued)

[-∞,3]

[-∞,+∞]

Page 11: Artificial Intelligence 1: game playingdi.ulb.ac.be/.../AI-course_files/games_2up.pdf4 November 2, 2004 TLo (IRIDIA) 7 Partial Game Tree for Tic-Tac-Toe November 2, 2004 TLo (IRIDIA)

11

TLo (IRIDIA) 21November 2, 2004

Alpha-Beta Example(continued)

[3,+∞]

[3,3]

TLo (IRIDIA) 22November 2, 2004

Alpha-Beta Example(continued)

[-∞,2]

[3,+∞]

[3,3]

This node is worse for MAX

Page 12: Artificial Intelligence 1: game playingdi.ulb.ac.be/.../AI-course_files/games_2up.pdf4 November 2, 2004 TLo (IRIDIA) 7 Partial Game Tree for Tic-Tac-Toe November 2, 2004 TLo (IRIDIA)

12

TLo (IRIDIA) 23November 2, 2004

Alpha-Beta Example(continued)

[-∞,2]

[3,14]

[3,3] [-∞,14]

,

TLo (IRIDIA) 24November 2, 2004

Alpha-Beta Example(continued)

[_∞,2]

[3,5]

[3,3] [-∞,5]

,

Page 13: Artificial Intelligence 1: game playingdi.ulb.ac.be/.../AI-course_files/games_2up.pdf4 November 2, 2004 TLo (IRIDIA) 7 Partial Game Tree for Tic-Tac-Toe November 2, 2004 TLo (IRIDIA)

13

TLo (IRIDIA) 25November 2, 2004

Alpha-Beta Example(continued)

[2,2][_∞,2]

[3,3]

[3,3]

TLo (IRIDIA) 26November 2, 2004

Alpha-Beta Example(continued)

[2,2][-∞,2]

[3,3]

[3,3]

Page 14: Artificial Intelligence 1: game playingdi.ulb.ac.be/.../AI-course_files/games_2up.pdf4 November 2, 2004 TLo (IRIDIA) 7 Partial Game Tree for Tic-Tac-Toe November 2, 2004 TLo (IRIDIA)

14

TLo (IRIDIA) 27November 2, 2004

Alpha-Beta Algorithm

function ALPHA-BETA-SEARCH(state) returns an action inputs: state, current state in game v←MAX-VALUE(state, - ∞ , +∞) return the action in SUCCESSORS(state) with value v

function MAX-VALUE(state,α , β) returns a utility value if TERMINAL-TEST(state) then return UTILITY(state) v ← - ∞ for a,s in SUCCESSORS(state) do v ← MAX(v,MIN-VALUE(s), α , β) if v ≥ β then return v α ← MAX(α ,v) return v

TLo (IRIDIA) 28November 2, 2004

Alpha-Beta Algorithm

function MIN-VALUE(state, α , β) returns a utility value if TERMINAL-TEST(state) then return UTILITY(state) v ← + ∞ for a,s in SUCCESSORS(state) do v ← MIN(v,MAX-VALUE(s), α , β) if v ≤ α then return v β ← MIN(β ,v) return v

Page 15: Artificial Intelligence 1: game playingdi.ulb.ac.be/.../AI-course_files/games_2up.pdf4 November 2, 2004 TLo (IRIDIA) 7 Partial Game Tree for Tic-Tac-Toe November 2, 2004 TLo (IRIDIA)

15

TLo (IRIDIA) 29November 2, 2004

General alpha-beta pruning

Consider a node n somewherein the tree

If player has a better choice at Parent node of n

Or any choice point further

up

n will never be reached inactual play.

Hence when enough is knownabout n, it can be pruned.

TLo (IRIDIA) 30November 2, 2004

Final Comments about Alpha-Beta Pruning Pruning does not affect final results

Entire subtrees can be pruned.

Good move ordering improves effectiveness of pruning

With “perfect ordering,” time complexity is O(bm/2) Branching factor of sqrt(b) !!

Alpha-beta pruning can look twice as far as minimax inthe same amount of time

Repeated states are again possible. Store them in memory = transposition table

Page 16: Artificial Intelligence 1: game playingdi.ulb.ac.be/.../AI-course_files/games_2up.pdf4 November 2, 2004 TLo (IRIDIA) 7 Partial Game Tree for Tic-Tac-Toe November 2, 2004 TLo (IRIDIA)

16

TLo (IRIDIA) 31November 2, 2004

Games of imperfect information

Minimax and alpha-beta pruning require to muchleaf-node evaluations.

May be impractical within a reasonable amount oftime.

SHANNON (1950): Cut off search earlier (replace TERMINAL-TEST by

CUTOFF-TEST)

Apply heuristic evaluation function EVAL (replacingutility function of alpha-beta)

TLo (IRIDIA) 32November 2, 2004

Cutting off search

Change: if TERMINAL-TEST(state) then return UTILITY(state)

into if CUTOFF-TEST(state,depth) then return EVAL(state)

Introduces a fixed-depth limit depth Is selected in so that the amount of time will not exceed what the

rules of the game allow.

When cuttoff occurs, the evaluation is performed.

Page 17: Artificial Intelligence 1: game playingdi.ulb.ac.be/.../AI-course_files/games_2up.pdf4 November 2, 2004 TLo (IRIDIA) 7 Partial Game Tree for Tic-Tac-Toe November 2, 2004 TLo (IRIDIA)

17

TLo (IRIDIA) 33November 2, 2004

Heuristic EVAL

Idea: produce an estimate of the expected utility of thegame from a given position.

Performance depends on quality of EVAL. Requirements:

EVAL should order terminal-nodes in the same way asUTILITY.

Computation may not take too long.

For non-terminal states the EVAL should be stronglycorrelated with the actual chance of winning.

Only useful for quienscent (no wild swings in value innear future) states

TLo (IRIDIA) 34November 2, 2004

Heuristic EVAL example

Eval(s) = w1 f1(s) + w2 f2(s) + … + wnfn(s)

Page 18: Artificial Intelligence 1: game playingdi.ulb.ac.be/.../AI-course_files/games_2up.pdf4 November 2, 2004 TLo (IRIDIA) 7 Partial Game Tree for Tic-Tac-Toe November 2, 2004 TLo (IRIDIA)

18

TLo (IRIDIA) 35November 2, 2004

Heuristic EVAL example

Eval(s) = w1 f1(s) + w2 f2(s) + … + wnfn(s)

Addition assumes independence

TLo (IRIDIA) 36November 2, 2004

Heuristic difficultiesHeuristic counts pieces won

Page 19: Artificial Intelligence 1: game playingdi.ulb.ac.be/.../AI-course_files/games_2up.pdf4 November 2, 2004 TLo (IRIDIA) 7 Partial Game Tree for Tic-Tac-Toe November 2, 2004 TLo (IRIDIA)

19

TLo (IRIDIA) 37November 2, 2004

Horizon effect Fixed depth search thinks it can avoidthe queening move

TLo (IRIDIA) 38November 2, 2004

Games that include chance

Possible moves (5-10,5-11), (5-11,19-24),(5-10,10-16)and (5-11,11-16)

Page 20: Artificial Intelligence 1: game playingdi.ulb.ac.be/.../AI-course_files/games_2up.pdf4 November 2, 2004 TLo (IRIDIA) 7 Partial Game Tree for Tic-Tac-Toe November 2, 2004 TLo (IRIDIA)

20

TLo (IRIDIA) 39November 2, 2004

Games that include chance

Possible moves (5-10,5-11), (5-11,19-24),(5-10,10-16) and (5-11,11-16)

[1,1], [6,6] chance 1/36, all other chance 1/18

chance nodes

TLo (IRIDIA) 40November 2, 2004

Games that include chance

[1,1], [6,6] chance 1/36, all other chance 1/18 Can not calculate definite minimax value, only expected value

Page 21: Artificial Intelligence 1: game playingdi.ulb.ac.be/.../AI-course_files/games_2up.pdf4 November 2, 2004 TLo (IRIDIA) 7 Partial Game Tree for Tic-Tac-Toe November 2, 2004 TLo (IRIDIA)

21

TLo (IRIDIA) 41November 2, 2004

Expected minimax value

EXPECTED-MINIMAX-VALUE(n)=

UTILITY(n) If n is a terminalmaxs ∈ successors(n) MINIMAX-VALUE(s) If n is a max node

mins ∈ successors(n) MINIMAX-VALUE(s) If n is a max node∑s ∈ successors(n) P(s) . EXPECTEDMINIMAX(s) If n is a chance node

These equations can be backed-up recursively allthe way to the root of the game tree.

TLo (IRIDIA) 42November 2, 2004

Position evaluation with chancenodes

Left, A1 wins Right A2 wins Outcome of evaluation function may not change when values are

scaled differently. Behavior is preserved only by a positive linear transformation of

EVAL.

Page 22: Artificial Intelligence 1: game playingdi.ulb.ac.be/.../AI-course_files/games_2up.pdf4 November 2, 2004 TLo (IRIDIA) 7 Partial Game Tree for Tic-Tac-Toe November 2, 2004 TLo (IRIDIA)

22

TLo (IRIDIA) 43November 2, 2004

Discussion

Examine section on state-of-the-art games yourself

Minimax assumes right tree is better than left, yet … Return probability distribution over possible values

Yet expensive calculation

TLo (IRIDIA) 44November 2, 2004

Discussion

Utility of node expansion Only expand those nodes which lead to significanlty

better moves

Both suggestions require meta-reasoning

Page 23: Artificial Intelligence 1: game playingdi.ulb.ac.be/.../AI-course_files/games_2up.pdf4 November 2, 2004 TLo (IRIDIA) 7 Partial Game Tree for Tic-Tac-Toe November 2, 2004 TLo (IRIDIA)

23

TLo (IRIDIA) 45November 2, 2004

Summary

Games are fun (and dangerous)

They illustrate several important points about AI Perfection is unattainable -> approximation

Good idea what to think about

Uncertainty constrains the assignment of values tostates

Games are to AI as grand prix racing is toautomobile design.