cs 188: artificial intelligence fall 2009 lecture 6: adversarial search 9/15/2009 dan klein – uc...
Post on 21-Dec-2015
217 views
TRANSCRIPT
CS 188: Artificial IntelligenceFall 2009
Lecture 6: Adversarial Search
9/15/2009
Dan Klein – UC Berkeley
Many slides over the course adapted from either Stuart Russell or Andrew Moore
1
Announcements
Written 1 has been up (Search and CSPs)
Project 2 will be up soon (Multi-Agent Pacman)
Other annoucements: None yet
2
Today
Finish up Search and CSPs
Start on Adversarial Search
3
Tree-Structured CSPs
Theorem: if the constraint graph has no loops, the CSP can be solved in O(n d2) time Compare to general CSPs, where worst-case time is O(dn)
This property also applies to probabilistic reasoning (later): an important example of the relation between syntactic restrictions and the complexity of reasoning.
4
Tree-Structured CSPs
Choose a variable as root, ordervariables from root to leaves suchthat every node’s parent precedesit in the ordering
For i = n : 2, apply RemoveInconsistent(Parent(Xi),Xi) For i = 1 : n, assign Xi consistently with Parent(Xi)
Runtime: O(n d2) (why?)5
Tree-Structured CSPs
Why does this work? Claim: After each node is processed leftward, all nodes
to the right can be assigned in any way consistent with their parent.
Proof: Induction on position
Why doesn’t this algorithm work with loops?
Note: we’ll see this basic idea again with Bayes’ nets
6
Nearly Tree-Structured CSPs
Conditioning: instantiate a variable, prune its neighbors' domains
Cutset conditioning: instantiate (in all ways) a set of variables such that the remaining constraint graph is a tree
Cutset size c gives runtime O( (dc) (n-c) d2 ), very fast for small c
7
Tree Decompositions*
8
Create a tree-structured graph of overlapping subproblems, each is a mega-variable
Solve each subproblem to enforce local constraints Solve the CSP over subproblem mega-variables
using our efficient tree-structured CSP algorithm
M1 M2 M3 M4
{(WA=r,SA=g,NT=b), (WA=b,SA=r,NT=g), …}
{(NT=r,SA=g,Q=b), (NT=b,SA=g,Q=r), …}
Agree: (M1,M2) {((WA=g,SA=g,NT=g), (NT=g,SA=g,Q=g)), …}
Ag
ree
on
sha
red
vars
NT
SA
WA
Q
SA
NT
A
gre
e o
n sh
are
d va
rs
NSW
SA
Q
Ag
ree
on
sha
red
vars
Q
SA
NSW
Iterative Algorithms for CSPs
Local search methods: typically work with “complete” states, i.e., all variables assigned
To apply to CSPs: Start with some assignment with unsatisfied constraints Operators reassign variable values No fringe! Live on the edge.
Variable selection: randomly select any conflicted variable
Value selection by min-conflicts heuristic: Choose value that violates the fewest constraints I.e., hill climb with h(n) = total number of violated constraints
9
Example: 4-Queens
States: 4 queens in 4 columns (44 = 256 states) Operators: move queen in column Goal test: no attacks Evaluation: c(n) = number of attacks
[DEMO]10
Performance of Min-Conflicts
Given random initial state, can solve n-queens in almost constant time for arbitrary n with high probability (e.g., n = 10,000,000)
The same appears to be true for any randomly-generated CSP except in a narrow range of the ratio
11
Hill Climbing
Simple, general idea: Start wherever Always choose the best neighbor If no neighbors have better scores than
current, quit
Why can this be a terrible idea? Complete? Optimal?
What’s good about it?12
Hill Climbing Diagram
Random restarts? Random sideways steps?
13
Simulated Annealing Idea: Escape local maxima by allowing downhill moves
But make them rarer as time goes on
14
Summary
CSPs are a special kind of search problem: States defined by values of a fixed set of variables Goal test defined by constraints on variable values
Backtracking = depth-first search with incremental constraint checks
Ordering: variable and value choice heuristics help significantly
Filtering: forward checking, arc consistency prevent assignments that guarantee later failure
Structure: Disconnected and tree-structured CSPs are efficient
Iterative improvement: min-conflicts is usually effective in practice
15
Game Playing State-of-the-Art Checkers: Chinook ended 40-year-reign of human world champion Marion
Tinsley in 1994. Used an endgame database defining perfect play for all positions involving 8 or fewer pieces on the board, a total of 443,748,401,247 positions. Checkers is now solved!
Chess: Deep Blue defeated human world champion Gary Kasparov in a six-game match in 1997. Deep Blue examined 200 million positions per second, used very sophisticated evaluation and undisclosed methods for extending some lines of search up to 40 ply. Current programs are even better, if less historic.
Othello: Human champions refuse to compete against computers, which are too good.
Go: Human champions are beginning to be challenged by machines, though the best humans still beat the best machines. In go, b > 300, so most programs use pattern knowledge bases to suggest plausible moves, along with aggressive pruning.
Pacman: unknown
16
GamesCrafters
http://gamescrafters.berkeley.edu/17
Adversarial Search
[DEMO: mystery pacman]
18
Game Playing
Many different kinds of games!
Axes: Deterministic or stochastic? One, two, or more players? Perfect information (can you see the state)?
Want algorithms for calculating a strategy (policy) which recommends a move in each state
19
Deterministic Games
Many possible formalizations, one is: States: S (start at s0)
Players: P={1...N} (usually take turns) Actions: A (may depend on player / state) Transition Function: SxA S Terminal Test: S {t,f} Terminal Utilities: SxP R
Solution for a player is a policy: S A
20
Deterministic Single-Player?
Deterministic, single player, perfect information: Know the rules Know what actions do Know when you win E.g. Freecell, 8-Puzzle, Rubik’s
cube … it’s just search! Slight reinterpretation:
Each node stores a value: the best outcome it can reach
This is the maximal outcome of its children (the max value)
Note that we don’t have path sums as before (utilities at end)
After search, can pick move that leads to best node win loselose
21
Deterministic Two-Player
E.g. tic-tac-toe, chess, checkers Zero-sum games
One player maximizes result The other minimizes result
Minimax search A state-space search tree Players alternate Each layer, or ply, consists of a
round of moves* Choose move to position with
highest minimax value = best achievable utility against best play
8 2 5 6
max
min
22
* Slightly different from the book definition
Tic-tac-toe Game Tree
23
Minimax Example
24
Minimax Search
25
Minimax Properties
Optimal against a perfect player. Otherwise?
Time complexity? O(bm)
Space complexity? O(bm)
For chess, b 35, m 100 Exact solution is completely infeasible But, do we need to explore the whole tree?
10 10 9 100
max
min
[DEMO: minVsExp]
26
Resource Limits Cannot search to leaves
Depth-limited search Instead, search a limited depth of tree Replace terminal utilities with an eval function
for non-terminal positions
Guarantee of optimal play is gone
More plies makes a BIG difference [DEMO: limitedDepth]
Example: Suppose we have 100 seconds, can explore
10K nodes / sec So can check 1M nodes per move - reaches about depth 8 – decent chess
program? ? ? ?
-1 -2 4 9
4
min min
max
-2 4
27
Evaluation Functions Function which scores non-terminals
Ideal function: returns the utility of the position In practice: typically weighted linear sum of features:
e.g. f1(s) = (num white queens – num black queens), etc.28
Evaluation for Pacman
[DEMO: thrashing, smart ghosts]
29
Why Pacman Starves
He knows his score will go up by eating the dot now
He knows his score will go up just as much by eating the dot later on
There are no point-scoring opportunities after eating the dot
Therefore, waiting seems just as good as eating
31
Iterative DeepeningIterative deepening uses DFS as a subroutine:
1. Do a DFS which only searches for paths of length 1 or less. (DFS gives up on any path of length 2)
2. If “1” failed, do a DFS which only searches paths of length 2 or less.
3. If “2” failed, do a DFS which only searches paths of length 3 or less.
….and so on.
Why do we want to do this for multiplayer games?
…b
32
- Pruning Example
34
- Pruning
General configuration is the best value that
MAX can get at any choice point along the current path
If n becomes worse than , MAX will avoid it, so can stop considering n’s other children
Define similarly for MIN
Player
Opponent
Player
Opponent
n
35
- Pruning Pseudocode
v36
- Pruning Properties
Pruning has no effect on final result
Good move ordering improves effectiveness of pruning
With “perfect ordering”: Time complexity drops to O(bm/2) Doubles solvable depth Full search of, e.g. chess, is still hopeless!
A simple example of metareasoning, here reasoning about which computations are relevant
37
Non-Zero-Sum Games
Similar to minimax: Utilities are
now tuples Each player
maximizes their own entry at each node
Propagate (or back up) nodes from children
1,2,6 4,3,2 6,1,2 7,4,1 5,1,1 1,5,2 7,7,1 5,4,5
38
Stochastic Single-Player What if we don’t know what the
result of an action will be? E.g., In solitaire, shuffle is unknown In minesweeper, mine locations In pacman, ghosts!
Can do expectimax search Chance nodes, like actions except
the environment controls the action chosen
Calculate utility for each node Max nodes as in search Chance nodes take average
(expectation) of value of children
Later, we’ll learn how to formalize this as a Markov Decision Process
10 4 5 7
max
average
[DEMO: minVsExp]
39
Stochastic Two-Player
E.g. backgammon Expectiminimax (!)
Environment is an extra player that moves after each agent
Chance nodes take expectations, otherwise like minimax
40
Stochastic Two-Player
Dice rolls increase b: 21 possible rolls with 2 dice Backgammon 20 legal moves Depth 4 = 20 x (21 x 20)3 1.2 x 109
As depth increases, probability of reaching a given node shrinks So value of lookahead is diminished So limiting depth is less damaging But pruning is less possible…
TDGammon uses depth-2 search + very good eval function + reinforcement learning: world-champion level play
41
What’s Next?
Make sure you know what: Probabilities are Expectations are
Next topics: Dealing with uncertainty How to learn evaluation functions Markov Decision Processes
42