Download - Extensive Games Raunak Jain

7/29/2019 Extensive Games Raunak Jain

1/2

ExtensivegamesRaunakJain

Thedifferencebetweenconcentratingonmovesratherthanstrategiesisthebasic

differencebetweenresponseandreinforcementlearningmethodsandgamesinextensiveform,duetotheirsize,canonlybesolvedasareinforcementlearning

problem.Inthesetupofthepaper,thegameisplayedasfollows(namelyexploratorymyopicstrategyrule):Theplayerusesavaluationmethodwherehe/sheevaluateshis

strategybyaveragingonthevaluereceivedbyplayingthatstrategybeforethat

roundandthenrevisesitafterplayingtheactionwiththehighestvaluation.Soineachroundhemyopicallychosesthestrategy,whichisthebestvaluationatthat

roundandnotnecessarilythemaximizingstrategyfortherepeatedgame.

Nowconsideringtheseupdatingruleswecansaythefollowingaboutthe

convergenceofthepayoffsofthelearning/non-learningplayersinthelongrun.

a) Thelearningplayerwhenplayinginascenariowhereonlyhelearns,earnsapayoffaftersometimepointwhichisatleast(1-epsilon)*minimaxpayofffortheplayerinthestagegame.Thereforinawayitguaranteesapayoffto

theplayerunderconsiderationdespitewhatotherplayersdo.Thereforeinthisscenariothelearningplayerlearnstoplaythestagegameitselfandnot

againstwhattheopponentsmightplay.

b) Inacasewhereallplayerslearnthepayoffstoeachplayerisgreaterthanwhathe/shemighthavegotincase(a).Thegameplayedwiththeearlier

assumptionsoflearningbyeachplayersimultaneouslythenthewholegameconvergestothesubgameperfectnashequilibrium.

Thecaseofthewinlosegames,wherethepayoffsareeither1or0,thatistheplayereitherwinsorloses,exploratorymyopicstrategyrulecanbeeasedinto

justmyopic,wherethemoveswithnotthehighestvaluationarenotconsidered

tobeplayedatall.Alsotheaveragingprocessismodifiedtoconsideronlythelastroundandnotthewholehistory.Insuchacasetheplayerwhowinsinthe

stagegamealwayswinsinthelongrunifhe/shefollowstherules.Noassumptionsismadeonthestrategiesoftheotherplayersandsincetheir

strategiesmightstillbedependentonthehistorytheprocessgeneratedcannot

bemodeledintoamarkovprocess.

Theexampleofthe0-1gameandthelearningprocessmentionedfordeleting

thestrategywiththeleastpayoffisnotthesameasallottingprobability0tothenothighestvaluationmethod.

QS.Iftheplayerassumesthattheotherplayersplaywithacertainstrategyand

heisawareofthepayoffsthendoesntitbecomeamarkovprocess?

Sincethelearningprocessdoesnotassumeanythingabouttheplayersandjustconsiderswhatshe/hegetsoutofthatmoveitiskindoflowinformation

7/29/2019 Extensive Games Raunak Jain

2/2

requirementstrategyalsothingslikesimilarityofmovesmakesitefficienttouse

itonthecomputer.Theorem1considersawin-losesituationandgivenapositiveinitialvaluationof

astrategyiinasuperstrategysetandthegamehasastrategywhichguaranteeshimawin,thenplayingwithmyopicstrategyandmemoryless

revision,thereisatimewhenthestrategyiswinningforever.IntheexamplewhereplayeraevaluatestheinitialpayoffstoLandRas0,itis

shownthatplayeracannotreachthestrategywhichiswinning,orgiveshim

thewinningpayoffwithprobability1,andcanonlyreachitwithprobability.

Qs.Isgivingpositiveinitialvaluationstotheactionsenoughtoguaranteetheconvergencetoawinningstrategyortherearecaseswhenevenpositiveinitial

valuationscanleadtosomestrategiesbeingruledout?

Download - Extensive Games Raunak Jain

Top Related