lecture6games - dept.cs.williams.edu

2/14/17

AndreaDanylukFebruary15,2017

Announcements

•  ProgrammingAssignment1:Search– SCllinprogress– AnoteaboutdesigningheurisCcs:•  Adda“feature”ataCme•  Considerdifferentweightsfordifferentfeatures•  ThinkbeyondaddingheurisCcinformaContogether•  OnceyouhaveafuncConthatworkswell,removeelementstodeterminewhetheryoureallyneedthem

•  Games(repeatedfromlastCme)– Planning/problemsolvinginthepresenceofanadversaryèadversarialsearch

– Whygames?•  Easytomeasuresuccessorfailure•  Statesandrulesaregenerallyeasytospecify•  InteresCngandcomplex

–  SpaceandCmecomplexity–  Uncertaintyofadversaries’acCon,rollsofdice,etc.

•  AlphaGobecamethefirstprogramtobeatahumanprofessionalGoplayerwithouthandicapsonafull19x19board.

•  Ingo,b>300•  UsesMonteCarlotreesearchtoselectmoves.•  UsesknowledgelearnedfromacombinaConofreinforcementanddeeplearning.

Backgammon

•  TDGammonusesdepth-2search+verygoodevaluaConfuncCon+reinforcementlearning(GerryTesauro,IBM)

•  World-championlevelplay•  1stAIworldchampioninanygame!

•  Libratus[SandholmandBrown,CMU]won$1.7m(inchips)from4professionalpokerplayersover20daysinJanuary2017

•  No-limitTexasHold’em•  Hardbecauseit’sagameofimperfectinformaCon.Can’tseetheopponent’shand.

•  The“finalfronCer”ingames…

[AdaptedfromCS188Berkeley]

2/14/17

TypesofGames

Chess,Checkers,Go,ConnectFour

Backgammon

Bajleship,GuessWho?

Bridge,Poker,Scrabble

DeterminisCc Chance

PerfectInformaCon

ImperfectInformaCon

[AdaptedfromRussellandfromCS188Berkeley]

TypesofGames

Chess,Checkers,Go,ConnectFour

Backgammon

Bajleship,GuessWho?

Bridge,Poker,Scrabble

DeterminisCc Chance

PerfectInformaCon

ImperfectInformaCon

[AdaptedfromRussellandfromCS188Berkeley]

Wantalgorithmsforcalcula1ngastrategy(policy)thatrecommendsamoveineachstate

ConnectFourDemo

•  Withperfectplay,firstplayercanforceawinbystarCnginthemiddlecolumn.

•  BystarCnginoneofthetwoadjacentcolumns,thefirstplayerallowsthesecondplayertoreachadraw.

•  BystarCnginanyofthefouroutercolumns,thefirstplayerallowsthesecondplayertoforceawin.

•  Thereexistperfectplayers–mydemoprogramisnotoneofthem.

GamePlayingasaSearchProblem

Notethateachlevelinthegametree(i.e.,eachhalfmove)iscalledaply.

FormulaCngGamePlayingasSearch

•  StatesS–  DescripConofthecurrentstate/configuraConofthegame

•  PlayersP={1,2,…,n}–  Willtaketurnsinthegamesweconsider

•  AcConsA–  LegalacConsmaydependonplayerandstate

•  TransiConmodel–  DefinestheresultofanacConappliedtoastateforaparCcularplayer–  Resultisanewstate

•  Terminaltest–  FuncCononstates;returnsTifstateisaterminalstateandF

otherwise•  UClityfuncConSxP->value

–  AlsocalledobjecCvefuncConorpayofffuncCon

GamesvsSearchProblems

•  “Unpredictable”opponent⇒soluConisastrategy

•  Timelimits⇒unlikelytoreachterminalstates.– Mustapproximate

2/14/17

MinimaxSearch

•  Whenit’syourturn,generate(ideally)thecompletegametree.

•  Selectthemovethatisbestforyou,assumingthatyouropponentwill,ateachopportunity,selectthemovethatisworstforyou(andthusbestforhim/her/itself)

AnExample:2-playerzero-sumgame

-1 1 -1

2/14/17

-1 1 -1

MinimaxSearchrevisited

•  Astate-spacesearchtree•  Playersalternateturns•  Eachnodehasaminimaxvalue:bestachievableuClityagainstaraConaladversary

2/14/17

AnotherExample

7 26 3 0 6-2 52 96 2

AnotherExample

7 26 3 0 6-2 52 96 2

AnotherExample

7 26 3 0 6-2 52 96 2

AnotherExample

7 26 3 0 6-2 52 96 2

AnotherExample

7 26 3 0 6-2 52 96 2

AnotherExample

7 26 3 0 6-2 52 96 2

2/14/17

AnotherExample

7 26 3 0 6-2 52 96 2

AnotherExample

7 26 3 0 6-2 52 96 2

AnotherExample

7 26 3 0 6-2 52 96 2

AnotherExample

7 26 3 0 6-2 52 96 2

AnotherExample

7 26 3 0 6-2 52 96 2

Butreallydonedepth-first

2/14/17

Really… Really…

Really…

2/14/17

Really…

2/14/17

Really…

Really…3

function MINIMAX-DECISION(state) returns an action areturn arg max a in ACTIONS(state) MIN-VALUE(RESULT(state, a))

function MIN-VALUE(state) returns a utility value v if TERMINAL-TEST(state) then return UTILITY(state) v = infinity for each a in ACTIONS(state) do v = MIN(v, MAX-VALUE(RESULT(state, a))) return v

function MAX-VALUE(state) returns a utility value v if TERMINAL-TEST(state) then return UTILITY(state) v = -infinity for each a in ACTIONS(state) do v = MAX(v, MIN-VALUE(RESULT(state, a))) return v

MinimaxReality•  CanrarelyexploreenCresearchspacetoterminalnodes.

–  DFShasgoodspacecomplexity,butbadCmecomplexity•  Chooseadepthcutoff–i.e.,amaximumply•  NeedanevaluaConfuncCon

–  ReturnsanesCmateoftheexpecteduClityofthegamefromagivenposiCon

–  MustordertheterminalstatesinthesamewayasthetrueuClityfuncCon

–  Mustbeefficienttocompute•  TradingoffpliesforheurisCccomputaCon•  Morepliesmakesadifference

•  ConsideriteraCvedeepening

2/14/17

EvaluaConFuncCons

•  Ideal:returnstheuClityoftheposiCon•  InpracCce:typicallyweightedlinearsumoffeatures:

•  Eval(s)=w1f1(s)+w2f2(s)+…+wnfn(s)

Exercise

•  EvaluaConfuncConforConnectFour?

lecture6games - dept.cs.williams.edu

Documents

computation in the wild: reconsidering dynamic systems in...

array shadow state compression for precise dynamic race...

handout 12 programming project 2 - computer...

a type system for object initialization in the java...