Download - Adversarial Search - Brown University
![Page 2: Adversarial Search - Brown University](https://reader030.vdocuments.site/reader030/viewer/2022012604/61996f2f0935bd73e43ceb67/html5/thumbnails/2.jpg)
Games
“Chess is the Drosophila of Artificial Intelligence”Kronrod, c. 1966
TuroChamp, 1948
![Page 3: Adversarial Search - Brown University](https://reader030.vdocuments.site/reader030/viewer/2022012604/61996f2f0935bd73e43ceb67/html5/thumbnails/3.jpg)
GamesProgramming a Computer for Playing Chess - Claude Shannon, 1950.
“The chess machine is an ideal one to start with, since: (1) the problem is sharply defined both in allowed operations (the moves) and in the ultimate goal (checkmate); (2) it is neither so simple as to be trivial nor too difficult for satisfactory solution; (3) chess is generally considered to require "thinking" for skillful play; a solution of this problem will force us either to admit the possibility of a mechanized thinking or to further restrict our concept of "thinking"; (4) the discrete structure of chess fits well into the digital nature of modern computers.”
![Page 4: Adversarial Search - Brown University](https://reader030.vdocuments.site/reader030/viewer/2022012604/61996f2f0935bd73e43ceb67/html5/thumbnails/4.jpg)
“Solved” GamesA game is solved if an optimal strategy is known.
Strong solved: all positions.Weakly solved: some (start) positions.
![Page 5: Adversarial Search - Brown University](https://reader030.vdocuments.site/reader030/viewer/2022012604/61996f2f0935bd73e43ceb67/html5/thumbnails/5.jpg)
Typical Game SettingGames are usually:
• 2 player• Alternating• Zero-sum
• Gain for one loss for another.• Perfect information
![Page 6: Adversarial Search - Brown University](https://reader030.vdocuments.site/reader030/viewer/2022012604/61996f2f0935bd73e43ceb67/html5/thumbnails/6.jpg)
Typical Game SettingVery much like search:
• Set of possible states• Start state• Successor function• Terminal states (many)• Objective function
The key difference is alternating control.
![Page 7: Adversarial Search - Brown University](https://reader030.vdocuments.site/reader030/viewer/2022012604/61996f2f0935bd73e43ceb67/html5/thumbnails/7.jpg)
Game Trees
o oo…
player 1 moves
…o o ox x
x
player 2 moves
player 1 moveso
x
ox
ox
…o o
o
![Page 8: Adversarial Search - Brown University](https://reader030.vdocuments.site/reader030/viewer/2022012604/61996f2f0935bd73e43ceb67/html5/thumbnails/8.jpg)
Key Differences vs. Search
p1
p2 p2 p2
p1p1p1
…
only get score here
they select to min score
you select to max score
![Page 9: Adversarial Search - Brown University](https://reader030.vdocuments.site/reader030/viewer/2022012604/61996f2f0935bd73e43ceb67/html5/thumbnails/9.jpg)
Minimax
s0
s1 s2 s3
s6s5s4
…
score
min
max
Propagate value backwards through tree.
g1 g2 g3
V(s5) = max(V(g1), V(g2), V(g3))
V(s2) = min(V(s4), V(s5), V(s6))
V(s0) = max(V(s1), V(s2), V(s3))
max
![Page 10: Adversarial Search - Brown University](https://reader030.vdocuments.site/reader030/viewer/2022012604/61996f2f0935bd73e43ceb67/html5/thumbnails/10.jpg)
Minimax AlgorithmCompute value for each node, going backwards from the end-nodes.
Max (min) player: select action to maximize (minimize) return.
Optimal for both players (if zero sum).Assumes perfect play, worst case.
Can run as depth first:• Time O(bd)• Space O(bd)
Require the agent to evaluate the whole tree.
![Page 11: Adversarial Search - Brown University](https://reader030.vdocuments.site/reader030/viewer/2022012604/61996f2f0935bd73e43ceb67/html5/thumbnails/11.jpg)
Minimaxp1
p2 p2 p2
p1 p1 p1 p1 p1 p110 5 -3 20 -5 2
max
min
5 -3 -5
5
![Page 12: Adversarial Search - Brown University](https://reader030.vdocuments.site/reader030/viewer/2022012604/61996f2f0935bd73e43ceb67/html5/thumbnails/12.jpg)
Games of ChanceWhat if there is a chance element?
![Page 13: Adversarial Search - Brown University](https://reader030.vdocuments.site/reader030/viewer/2022012604/61996f2f0935bd73e43ceb67/html5/thumbnails/13.jpg)
StochasticityAn outcome is called stochastic when it is determined at random.
321
654
p=1/6
p=1/6
p=1/6
p=1/6
p=1/6
p=1/6
sums to 1
![Page 14: Adversarial Search - Brown University](https://reader030.vdocuments.site/reader030/viewer/2022012604/61996f2f0935bd73e43ceb67/html5/thumbnails/14.jpg)
StochasticityHow to factor in stochasticity?
Agent does not get to choose.• Selecting the max outcome is optimistic.• Selecting the min outcome is pessimistic.
Must be probability-aware.
Be aware of who is choosing at each level. • Sometimes it is you.• Sometimes it is an adversary.• Sometimes it is a random number generator.
insert randomization layer
![Page 15: Adversarial Search - Brown University](https://reader030.vdocuments.site/reader030/viewer/2022012604/61996f2f0935bd73e43ceb67/html5/thumbnails/15.jpg)
ExpectiMax
dice
…
they select to min score
stochastic
p1 p1 p1you select
(max)
dicedice dice
p2
stochastic
p2p2
![Page 16: Adversarial Search - Brown University](https://reader030.vdocuments.site/reader030/viewer/2022012604/61996f2f0935bd73e43ceb67/html5/thumbnails/16.jpg)
ExpectationHow to compute value of stochastic layer?
What is the average die value?
This factors in both probabilities and the value of event.
In general, given random event x and function f(x):
(1 + 2 + 3 + 4 + 5 + 6)
6= 3.5
E[f(x)] =X
x
P (x)f(x)
![Page 17: Adversarial Search - Brown University](https://reader030.vdocuments.site/reader030/viewer/2022012604/61996f2f0935bd73e43ceb67/html5/thumbnails/17.jpg)
ExpectiMax
dice
…
they select to min score
stochastic(expectation)
p1 p1 p1you select
(max)
dicedice dice
p2
stochastic(expectation)
p2p2
![Page 18: Adversarial Search - Brown University](https://reader030.vdocuments.site/reader030/viewer/2022012604/61996f2f0935bd73e43ceb67/html5/thumbnails/18.jpg)
Minimaxp1
p2 p2 p2
p1 p1 p1 p1 p1 p110 5 -3 20 -5 2
max
min
5 -3 -5
5
![Page 19: Adversarial Search - Brown University](https://reader030.vdocuments.site/reader030/viewer/2022012604/61996f2f0935bd73e43ceb67/html5/thumbnails/19.jpg)
In PracticeCan run as depth first:
• Time O(bd)• Space O(bd)
Depth is too deep. • 10s to 100s of moves.
Breadth is too broad. • Chess: 35, Go: 361.
Full search never terminates for non-trivial games.
![Page 20: Adversarial Search - Brown University](https://reader030.vdocuments.site/reader030/viewer/2022012604/61996f2f0935bd73e43ceb67/html5/thumbnails/20.jpg)
What Is To Be Done?Terminate early.Branch less often. p1
p2
p1p1p1
…p2 p2
![Page 21: Adversarial Search - Brown University](https://reader030.vdocuments.site/reader030/viewer/2022012604/61996f2f0935bd73e43ceb67/html5/thumbnails/21.jpg)
Alpha-Betap1
p2 p2 p2
p1 p1 p1 p1 p1 p110 5
max
min
5
-3 -5
![Page 22: Adversarial Search - Brown University](https://reader030.vdocuments.site/reader030/viewer/2022012604/61996f2f0935bd73e43ceb67/html5/thumbnails/22.jpg)
Alpha-BetaS
A
B10 5
max
min
5
-3
At a min layer:If V(B) V(A) then prune B’s siblings.
![Page 23: Adversarial Search - Brown University](https://reader030.vdocuments.site/reader030/viewer/2022012604/61996f2f0935bd73e43ceb67/html5/thumbnails/23.jpg)
Alpha-BetaS
B
A10 5 max
min
3
At a max layer:If V(A) V(B) then prune A’s siblings.�
5
![Page 24: Adversarial Search - Brown University](https://reader030.vdocuments.site/reader030/viewer/2022012604/61996f2f0935bd73e43ceb67/html5/thumbnails/24.jpg)
Alpha-Beta S
A
B
min
max
max
min
More generally:• is highest max• is lowest min
If max node:• prune if v
If min node:• prune if v
↵
�
� �
↵
![Page 25: Adversarial Search - Brown University](https://reader030.vdocuments.site/reader030/viewer/2022012604/61996f2f0935bd73e43ceb67/html5/thumbnails/25.jpg)
Alpha Beta
(from Russell and Norvig)
![Page 26: Adversarial Search - Brown University](https://reader030.vdocuments.site/reader030/viewer/2022012604/61996f2f0935bd73e43ceb67/html5/thumbnails/26.jpg)
Alpha Beta PruningSingle most useful search control method:
• Throw away whole branches.• Use the min-max behavior.
Resulting algorithm: alpha-beta pruning.
Empirically: square roots branching factor.• Effectively doubles the search horizon.
Alpha-beta makes the difference between novice and expert computer game players. Most successful players use alpha-beta.
![Page 27: Adversarial Search - Brown University](https://reader030.vdocuments.site/reader030/viewer/2022012604/61996f2f0935bd73e43ceb67/html5/thumbnails/27.jpg)
What Is To Be Done?Terminate early.Branch less often. p1
p2
p1p1p1
…p2 p2
![Page 28: Adversarial Search - Brown University](https://reader030.vdocuments.site/reader030/viewer/2022012604/61996f2f0935bd73e43ceb67/html5/thumbnails/28.jpg)
In PracticeSolution: substitute evaluation function.
• Like a heuristic - estimate value.• In this case, probability of win or expected score.
• Common strategy:• Run to fixed depth then estimate.• Careful lookahead to depth d, then guess.
p1 p2 p1 !
![Page 29: Adversarial Search - Brown University](https://reader030.vdocuments.site/reader030/viewer/2022012604/61996f2f0935bd73e43ceb67/html5/thumbnails/29.jpg)
Evaluation Functions
![Page 30: Adversarial Search - Brown University](https://reader030.vdocuments.site/reader030/viewer/2022012604/61996f2f0935bd73e43ceb67/html5/thumbnails/30.jpg)
Evaluation Functions
![Page 31: Adversarial Search - Brown University](https://reader030.vdocuments.site/reader030/viewer/2022012604/61996f2f0935bd73e43ceb67/html5/thumbnails/31.jpg)
Deep Blue (1997)
480 Special Purpose Chips200 million positions/sec
Search depth 6-8 moves (up to 20)
![Page 32: Adversarial Search - Brown University](https://reader030.vdocuments.site/reader030/viewer/2022012604/61996f2f0935bd73e43ceb67/html5/thumbnails/32.jpg)
![Page 33: Adversarial Search - Brown University](https://reader030.vdocuments.site/reader030/viewer/2022012604/61996f2f0935bd73e43ceb67/html5/thumbnails/33.jpg)
Evaluation Functions
![Page 34: Adversarial Search - Brown University](https://reader030.vdocuments.site/reader030/viewer/2022012604/61996f2f0935bd73e43ceb67/html5/thumbnails/34.jpg)
Search ControlHorizon Effects
• What if something interesting at horizon + 1?• How do you know?
More sophisticated strategies:• When to generate more nodes?• How to selectively expand the frontier?• How to allocate fixed move time?
![Page 35: Adversarial Search - Brown University](https://reader030.vdocuments.site/reader030/viewer/2022012604/61996f2f0935bd73e43ceb67/html5/thumbnails/35.jpg)
Monte Carlo Tree Search
…
Continually estimate valueAdaptively exploreRandom rollouts to evaluate
![Page 36: Adversarial Search - Brown University](https://reader030.vdocuments.site/reader030/viewer/2022012604/61996f2f0935bd73e43ceb67/html5/thumbnails/36.jpg)
Monte Carlo Tree Searchp1
p2
p1p1p1
…p2 p2
Step 1: path selection.
![Page 37: Adversarial Search - Brown University](https://reader030.vdocuments.site/reader030/viewer/2022012604/61996f2f0935bd73e43ceb67/html5/thumbnails/37.jpg)
Monte Carlo Tree Searchp1
p2
p1p1p1
…p2 p2
Step 1: path selection.
wi
ni+ c
rlog n
ni
UCT
![Page 38: Adversarial Search - Brown University](https://reader030.vdocuments.site/reader030/viewer/2022012604/61996f2f0935bd73e43ceb67/html5/thumbnails/38.jpg)
Monte Carlo Tree Searchp1
p2
p1p1p1
…p2 p2
Step 2: expansion.
p2
![Page 39: Adversarial Search - Brown University](https://reader030.vdocuments.site/reader030/viewer/2022012604/61996f2f0935bd73e43ceb67/html5/thumbnails/39.jpg)
Monte Carlo Tree Searchp1
p2
p1p1p1
…p2 p2
Step 3: rollout.
p2
terminal state
![Page 40: Adversarial Search - Brown University](https://reader030.vdocuments.site/reader030/viewer/2022012604/61996f2f0935bd73e43ceb67/html5/thumbnails/40.jpg)
Monte Carlo Tree Searchp1
p2
p1p1p1
…p2 p2
Step 4: update.
p2
terminal state
![Page 41: Adversarial Search - Brown University](https://reader030.vdocuments.site/reader030/viewer/2022012604/61996f2f0935bd73e43ceb67/html5/thumbnails/41.jpg)
Games TodayWorld champion level:• Backgammon• Chess• Checkers (solved)• Othello• Some poker types:“Heads-up Limit Hold’em Poker is Solved”, Bowling et al., Science, January 2015.
Perform well:• Bridge• Other poker types
Far off: Go
![Page 42: Adversarial Search - Brown University](https://reader030.vdocuments.site/reader030/viewer/2022012604/61996f2f0935bd73e43ceb67/html5/thumbnails/42.jpg)
Go
![Page 43: Adversarial Search - Brown University](https://reader030.vdocuments.site/reader030/viewer/2022012604/61996f2f0935bd73e43ceb67/html5/thumbnails/43.jpg)
Very Recently
Lee Sedol AlphaGo(Google Deepmind)
1 - 4
![Page 44: Adversarial Search - Brown University](https://reader030.vdocuments.site/reader030/viewer/2022012604/61996f2f0935bd73e43ceb67/html5/thumbnails/44.jpg)
![Page 45: Adversarial Search - Brown University](https://reader030.vdocuments.site/reader030/viewer/2022012604/61996f2f0935bd73e43ceb67/html5/thumbnails/45.jpg)
Board Games“ … board games are more or less done and it's time to move on.”