lecture6games - dept.cs.williams.edu

10
2/14/17 1 Games Andrea Danyluk February 15, 2017 Announcements Programming Assignment 1: Search SCll in progress A note about designing heurisCcs: Add a “feature” at a Cme Consider different weights for different features Think beyond adding heurisCc informaCon together Once you have a funcCon that works well, remove elements to determine whether you really need them Today Games (repeated from last Cme) Planning/problem solving in the presence of an adversary è adversarial search Why games? Easy to measure success or failure States and rules are generally easy to specify InteresCng and complex Space and Cme complexity Uncertainty of adversaries’ acCon, rolls of dice, etc. Go AlphaGo became the first program to beat a human professional Go player without handicaps on a full 19x19 board. In go, b > 300 Uses Monte Carlo tree search to select moves. Uses knowledge learned from a combinaCon of reinforcement and deep learning. Backgammon TDGammon uses depth-2 search + very good evaluaCon funcCon + reinforcement learning (Gerry Tesauro, IBM) World-champion level play 1st AI world champion in any game! Poker Libratus [Sandholm and Brown, CMU] won $1.7m (in chips) from 4 professional poker players over 20 days in January 2017 No-limit Texas Hold’em Hard because it’s a game of imperfect informaCon. Can’t see the opponent’s hand. The “final fronCer” in games… [Adapted from CS 188 Berkeley]

Upload: others

Post on 13-Feb-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

2/14/17

1

Games

AndreaDanylukFebruary15,2017

Announcements

•  ProgrammingAssignment1:Search– SCllinprogress– AnoteaboutdesigningheurisCcs:•  Adda“feature”ataCme•  Considerdifferentweightsfordifferentfeatures•  ThinkbeyondaddingheurisCcinformaContogether•  OnceyouhaveafuncConthatworkswell,removeelementstodeterminewhetheryoureallyneedthem

Today

•  Games(repeatedfromlastCme)– Planning/problemsolvinginthepresenceofanadversaryèadversarialsearch

– Whygames?•  Easytomeasuresuccessorfailure•  Statesandrulesaregenerallyeasytospecify•  InteresCngandcomplex

–  SpaceandCmecomplexity–  Uncertaintyofadversaries’acCon,rollsofdice,etc.

Go

•  AlphaGobecamethefirstprogramtobeatahumanprofessionalGoplayerwithouthandicapsonafull19x19board.

•  Ingo,b>300•  UsesMonteCarlotreesearchtoselectmoves.•  UsesknowledgelearnedfromacombinaConofreinforcementanddeeplearning.

Backgammon

•  TDGammonusesdepth-2search+verygoodevaluaConfuncCon+reinforcementlearning(GerryTesauro,IBM)

•  World-championlevelplay•  1stAIworldchampioninanygame!

Poker

•  Libratus[SandholmandBrown,CMU]won$1.7m(inchips)from4professionalpokerplayersover20daysinJanuary2017

•  No-limitTexasHold’em•  Hardbecauseit’sagameofimperfectinformaCon.Can’tseetheopponent’shand.

•  The“finalfronCer”ingames…

[AdaptedfromCS188Berkeley]

2/14/17

2

TypesofGames

Chess,Checkers,Go,ConnectFour

Backgammon

Bajleship,GuessWho?

Bridge,Poker,Scrabble

DeterminisCc Chance

PerfectInformaCon

ImperfectInformaCon

[AdaptedfromRussellandfromCS188Berkeley]

TypesofGames

Chess,Checkers,Go,ConnectFour

Backgammon

Bajleship,GuessWho?

Bridge,Poker,Scrabble

DeterminisCc Chance

PerfectInformaCon

ImperfectInformaCon

[AdaptedfromRussellandfromCS188Berkeley]

Wantalgorithmsforcalcula1ngastrategy(policy)thatrecommendsamoveineachstate

ConnectFourDemo

•  Withperfectplay,firstplayercanforceawinbystarCnginthemiddlecolumn.

•  BystarCnginoneofthetwoadjacentcolumns,thefirstplayerallowsthesecondplayertoreachadraw.

•  BystarCnginanyofthefouroutercolumns,thefirstplayerallowsthesecondplayertoforceawin.

•  Thereexistperfectplayers–mydemoprogramisnotoneofthem.

GamePlayingasaSearchProblem

Notethateachlevelinthegametree(i.e.,eachhalfmove)iscalledaply.

FormulaCngGamePlayingasSearch

•  StatesS–  DescripConofthecurrentstate/configuraConofthegame

•  PlayersP={1,2,…,n}–  Willtaketurnsinthegamesweconsider

•  AcConsA–  LegalacConsmaydependonplayerandstate

•  TransiConmodel–  DefinestheresultofanacConappliedtoastateforaparCcularplayer–  Resultisanewstate

•  Terminaltest–  FuncCononstates;returnsTifstateisaterminalstateandF

otherwise•  UClityfuncConSxP->value

–  AlsocalledobjecCvefuncConorpayofffuncCon

[AdaptedfromCS188Berkeley]

GamesvsSearchProblems

•  “Unpredictable”opponent⇒soluConisastrategy

•  Timelimits⇒unlikelytoreachterminalstates.– Mustapproximate

2/14/17

3

MinimaxSearch

•  Whenit’syourturn,generate(ideally)thecompletegametree.

•  Selectthemovethatisbestforyou,assumingthatyouropponentwill,ateachopportunity,selectthemovethatisworstforyou(andthusbestforhim/her/itself)

AnExample:2-playerzero-sumgame

1 -1

-1 1

-11 1

-1 1 -1

Max

min

Max

min

Max

min

AnExample:2-playerzero-sumgame

1 -1

-1 1

-11 1

-1 1 -1

Max

min

Max

min

Max

min

-1

AnExample:2-playerzero-sumgame

1 -1

-1 1

-11 1

-1 1 -1

Max

min

Max

min

Max

min

-1

-1

AnExample:2-playerzero-sumgame

1 -1

-1 1

-11 1

-1 1 -1

Max

min

Max

min

Max

min

-1

-1

-1

AnExample:2-playerzero-sumgame

1 -1

-1 1

-11 1

-1 1 -1

Max

min

Max

min

Max

min

-1

-1

-1

1

2/14/17

4

AnExample:2-playerzero-sumgame

1 -1

-1 1

-11 1

-1 1 -1

Max

min

Max

min

Max

min

-1

-1

-1

1

1

AnExample:2-playerzero-sumgame

1 -1

-1 1

-11 1

-1 1 -1

Max

min

Max

min

Max

min

-1

-1

-1

1

1

1

AnExample:2-playerzero-sumgame

1 -1

-1 1

-11 1

-1 1 -1

Max

min

Max

min

Max

min

-1

-1

-1

1

1

1

1

AnExample:2-playerzero-sumgame

1 -1

-1 1

-11 1

-1 1 -1

Max

min

Max

min

Max

min

-1

-1

-1

1

1

1

1 -1

AnExample:2-playerzero-sumgame

1 -1

-1 1

-11 1

-1 1 -1

Max

min

Max

min

Max

min

-1

-1

-1

1

1

1

1 -1

MinimaxSearchrevisited

•  Astate-spacesearchtree•  Playersalternateturns•  Eachnodehasaminimaxvalue:bestachievableuClityagainstaraConaladversary

[AdaptedfromCS188Berkeley]

2/14/17

5

AnotherExample

7 26 3 0 6-2 52 96 2

AnotherExample

7 26 3 0 6-2 52 96 2

7

AnotherExample

7 26 3 0 6-2 52 96 2

7 3

AnotherExample

7 26 3 0 6-2 52 96 2

7 3

3

AnotherExample

7 26 3 0 6-2 52 96 2

7 3

3

0

AnotherExample

7 26 3 0 6-2 52 96 2

7 3

3

0 6

2/14/17

6

AnotherExample

7 26 3 0 6-2 52 96 2

7 3

3

0 6

0

AnotherExample

7 26 3 0 6-2 52 96 2

7 3

3

0 6

0

6

AnotherExample

7 26 3 0 6-2 52 96 2

7 3

3

0 6

0

6 9

AnotherExample

7 26 3 0 6-2 52 96 2

7 3

3

0 6

0

6 9

6

AnotherExample

7 26 3 0 6-2 52 96 2

7 3

3

0 6

0

6 9

6

Butreallydonedepth-first

2/14/17

7

Really… Really…

Really…

7

Really…

7

Really…

6

7

Really…

7

2/14/17

8

Really…

7

Really…

7

Really…

7

2

Really…

7

2

Really…

7

2

3

Really…

7

3

2/14/17

9

Really…

3

Really…3

Really…3

Really…3

function MINIMAX-DECISION(state) returns an action areturn arg max a in ACTIONS(state) MIN-VALUE(RESULT(state, a))

function MIN-VALUE(state) returns a utility value v if TERMINAL-TEST(state) then return UTILITY(state) v = infinity for each a in ACTIONS(state) do v = MIN(v, MAX-VALUE(RESULT(state, a))) return v

function MAX-VALUE(state) returns a utility value v if TERMINAL-TEST(state) then return UTILITY(state) v = -infinity for each a in ACTIONS(state) do v = MAX(v, MIN-VALUE(RESULT(state, a))) return v

MinimaxReality•  CanrarelyexploreenCresearchspacetoterminalnodes.

–  DFShasgoodspacecomplexity,butbadCmecomplexity•  Chooseadepthcutoff–i.e.,amaximumply•  NeedanevaluaConfuncCon

–  ReturnsanesCmateoftheexpecteduClityofthegamefromagivenposiCon

–  MustordertheterminalstatesinthesamewayasthetrueuClityfuncCon

–  Mustbeefficienttocompute•  TradingoffpliesforheurisCccomputaCon•  Morepliesmakesadifference

•  ConsideriteraCvedeepening

2/14/17

10

EvaluaConFuncCons

•  Ideal:returnstheuClityoftheposiCon•  InpracCce:typicallyweightedlinearsumoffeatures:

•  Eval(s)=w1f1(s)+w2f2(s)+…+wnfn(s)

Exercise

•  EvaluaConfuncConforConnectFour?