lecture6games - dept.cs.williams.edu

Post on 13-Feb-2022

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

2/14/17

1

Games

AndreaDanylukFebruary15,2017

Announcements

•  ProgrammingAssignment1:Search– SCllinprogress– AnoteaboutdesigningheurisCcs:•  Adda“feature”ataCme•  Considerdifferentweightsfordifferentfeatures•  ThinkbeyondaddingheurisCcinformaContogether•  OnceyouhaveafuncConthatworkswell,removeelementstodeterminewhetheryoureallyneedthem

Today

•  Games(repeatedfromlastCme)– Planning/problemsolvinginthepresenceofanadversaryèadversarialsearch

– Whygames?•  Easytomeasuresuccessorfailure•  Statesandrulesaregenerallyeasytospecify•  InteresCngandcomplex

–  SpaceandCmecomplexity–  Uncertaintyofadversaries’acCon,rollsofdice,etc.

Go

•  AlphaGobecamethefirstprogramtobeatahumanprofessionalGoplayerwithouthandicapsonafull19x19board.

•  Ingo,b>300•  UsesMonteCarlotreesearchtoselectmoves.•  UsesknowledgelearnedfromacombinaConofreinforcementanddeeplearning.

Backgammon

•  TDGammonusesdepth-2search+verygoodevaluaConfuncCon+reinforcementlearning(GerryTesauro,IBM)

•  World-championlevelplay•  1stAIworldchampioninanygame!

Poker

•  Libratus[SandholmandBrown,CMU]won$1.7m(inchips)from4professionalpokerplayersover20daysinJanuary2017

•  No-limitTexasHold’em•  Hardbecauseit’sagameofimperfectinformaCon.Can’tseetheopponent’shand.

•  The“finalfronCer”ingames…

[AdaptedfromCS188Berkeley]

2/14/17

2

TypesofGames

Chess,Checkers,Go,ConnectFour

Backgammon

Bajleship,GuessWho?

Bridge,Poker,Scrabble

DeterminisCc Chance

PerfectInformaCon

ImperfectInformaCon

[AdaptedfromRussellandfromCS188Berkeley]

TypesofGames

Chess,Checkers,Go,ConnectFour

Backgammon

Bajleship,GuessWho?

Bridge,Poker,Scrabble

DeterminisCc Chance

PerfectInformaCon

ImperfectInformaCon

[AdaptedfromRussellandfromCS188Berkeley]

Wantalgorithmsforcalcula1ngastrategy(policy)thatrecommendsamoveineachstate

ConnectFourDemo

•  Withperfectplay,firstplayercanforceawinbystarCnginthemiddlecolumn.

•  BystarCnginoneofthetwoadjacentcolumns,thefirstplayerallowsthesecondplayertoreachadraw.

•  BystarCnginanyofthefouroutercolumns,thefirstplayerallowsthesecondplayertoforceawin.

•  Thereexistperfectplayers–mydemoprogramisnotoneofthem.

GamePlayingasaSearchProblem

Notethateachlevelinthegametree(i.e.,eachhalfmove)iscalledaply.

FormulaCngGamePlayingasSearch

•  StatesS–  DescripConofthecurrentstate/configuraConofthegame

•  PlayersP={1,2,…,n}–  Willtaketurnsinthegamesweconsider

•  AcConsA–  LegalacConsmaydependonplayerandstate

•  TransiConmodel–  DefinestheresultofanacConappliedtoastateforaparCcularplayer–  Resultisanewstate

•  Terminaltest–  FuncCononstates;returnsTifstateisaterminalstateandF

otherwise•  UClityfuncConSxP->value

–  AlsocalledobjecCvefuncConorpayofffuncCon

[AdaptedfromCS188Berkeley]

GamesvsSearchProblems

•  “Unpredictable”opponent⇒soluConisastrategy

•  Timelimits⇒unlikelytoreachterminalstates.– Mustapproximate

2/14/17

3

MinimaxSearch

•  Whenit’syourturn,generate(ideally)thecompletegametree.

•  Selectthemovethatisbestforyou,assumingthatyouropponentwill,ateachopportunity,selectthemovethatisworstforyou(andthusbestforhim/her/itself)

AnExample:2-playerzero-sumgame

1 -1

-1 1

-11 1

-1 1 -1

Max

min

Max

min

Max

min

AnExample:2-playerzero-sumgame

1 -1

-1 1

-11 1

-1 1 -1

Max

min

Max

min

Max

min

-1

AnExample:2-playerzero-sumgame

1 -1

-1 1

-11 1

-1 1 -1

Max

min

Max

min

Max

min

-1

-1

AnExample:2-playerzero-sumgame

1 -1

-1 1

-11 1

-1 1 -1

Max

min

Max

min

Max

min

-1

-1

-1

AnExample:2-playerzero-sumgame

1 -1

-1 1

-11 1

-1 1 -1

Max

min

Max

min

Max

min

-1

-1

-1

1

2/14/17

4

AnExample:2-playerzero-sumgame

1 -1

-1 1

-11 1

-1 1 -1

Max

min

Max

min

Max

min

-1

-1

-1

1

1

AnExample:2-playerzero-sumgame

1 -1

-1 1

-11 1

-1 1 -1

Max

min

Max

min

Max

min

-1

-1

-1

1

1

1

AnExample:2-playerzero-sumgame

1 -1

-1 1

-11 1

-1 1 -1

Max

min

Max

min

Max

min

-1

-1

-1

1

1

1

1

AnExample:2-playerzero-sumgame

1 -1

-1 1

-11 1

-1 1 -1

Max

min

Max

min

Max

min

-1

-1

-1

1

1

1

1 -1

AnExample:2-playerzero-sumgame

1 -1

-1 1

-11 1

-1 1 -1

Max

min

Max

min

Max

min

-1

-1

-1

1

1

1

1 -1

MinimaxSearchrevisited

•  Astate-spacesearchtree•  Playersalternateturns•  Eachnodehasaminimaxvalue:bestachievableuClityagainstaraConaladversary

[AdaptedfromCS188Berkeley]

2/14/17

5

AnotherExample

7 26 3 0 6-2 52 96 2

AnotherExample

7 26 3 0 6-2 52 96 2

7

AnotherExample

7 26 3 0 6-2 52 96 2

7 3

AnotherExample

7 26 3 0 6-2 52 96 2

7 3

3

AnotherExample

7 26 3 0 6-2 52 96 2

7 3

3

0

AnotherExample

7 26 3 0 6-2 52 96 2

7 3

3

0 6

2/14/17

6

AnotherExample

7 26 3 0 6-2 52 96 2

7 3

3

0 6

0

AnotherExample

7 26 3 0 6-2 52 96 2

7 3

3

0 6

0

6

AnotherExample

7 26 3 0 6-2 52 96 2

7 3

3

0 6

0

6 9

AnotherExample

7 26 3 0 6-2 52 96 2

7 3

3

0 6

0

6 9

6

AnotherExample

7 26 3 0 6-2 52 96 2

7 3

3

0 6

0

6 9

6

Butreallydonedepth-first

2/14/17

7

Really… Really…

Really…

7

Really…

7

Really…

6

7

Really…

7

2/14/17

8

Really…

7

Really…

7

Really…

7

2

Really…

7

2

Really…

7

2

3

Really…

7

3

2/14/17

9

Really…

3

Really…3

Really…3

Really…3

function MINIMAX-DECISION(state) returns an action areturn arg max a in ACTIONS(state) MIN-VALUE(RESULT(state, a))

function MIN-VALUE(state) returns a utility value v if TERMINAL-TEST(state) then return UTILITY(state) v = infinity for each a in ACTIONS(state) do v = MIN(v, MAX-VALUE(RESULT(state, a))) return v

function MAX-VALUE(state) returns a utility value v if TERMINAL-TEST(state) then return UTILITY(state) v = -infinity for each a in ACTIONS(state) do v = MAX(v, MIN-VALUE(RESULT(state, a))) return v

MinimaxReality•  CanrarelyexploreenCresearchspacetoterminalnodes.

–  DFShasgoodspacecomplexity,butbadCmecomplexity•  Chooseadepthcutoff–i.e.,amaximumply•  NeedanevaluaConfuncCon

–  ReturnsanesCmateoftheexpecteduClityofthegamefromagivenposiCon

–  MustordertheterminalstatesinthesamewayasthetrueuClityfuncCon

–  Mustbeefficienttocompute•  TradingoffpliesforheurisCccomputaCon•  Morepliesmakesadifference

•  ConsideriteraCvedeepening

2/14/17

10

EvaluaConFuncCons

•  Ideal:returnstheuClityoftheposiCon•  InpracCce:typicallyweightedlinearsumoffeatures:

•  Eval(s)=w1f1(s)+w2f2(s)+…+wnfn(s)

Exercise

•  EvaluaConfuncConforConnectFour?

top related