andrew gilpin and tuomas sandholm carnegie mellon university computer science department

24
A competitive Texas Hold’em poker player via automated abstraction and real-time equilibrium computation Andrew Gilpin Andrew Gilpin and and Tuomas Tuomas Sandholm Sandholm Carnegie Mellon University Carnegie Mellon University Computer Science Department Computer Science Department

Upload: jadyn

Post on 18-Feb-2016

41 views

Category:

Documents


0 download

DESCRIPTION

A competitive Texas Hold’em poker player via automated abstraction and real-time equilibrium computation. Andrew Gilpin and Tuomas Sandholm Carnegie Mellon University Computer Science Department. Motivation: Poker. Poker games are wildly popular card games 2006 World Series of Poker - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Andrew Gilpin  and  Tuomas Sandholm Carnegie Mellon University Computer Science Department

A competitive Texas Hold’em poker player via automated abstraction and

real-time equilibrium computation

Andrew GilpinAndrew Gilpin and and Tuomas SandholmTuomas SandholmCarnegie Mellon UniversityCarnegie Mellon University

Computer Science DepartmentComputer Science Department

Page 2: Andrew Gilpin  and  Tuomas Sandholm Carnegie Mellon University Computer Science Department

Motivation: Poker

• Poker games are wildly popular card games– 2006 World Series of Poker

• $82M at World Championship event• Portions broadcast on ESPN

• Presents several challenges for AI– Imperfect information– Risk assessment and management– Deception (bluffing, slow-playing)– Counter-deception (calling a bluff, addressing slow

play)

Page 3: Andrew Gilpin  and  Tuomas Sandholm Carnegie Mellon University Computer Science Department

Prior poker research• Simulation/Learning [e.g. Findler 77, Billings et al 99, 02]

– Do not take multi-agent aspect directly into account• Game-theoretic

– Small games [e.g. vN-M 44, Nash & Shapley 50, Kuhn 50]– Tournament games [Miltersen & Sørensen 06]– Manual abstraction for large games

• “Approximating Game-Theoretic Optimal Strategies for Full-scale Poker”, Billings, Burch, Davidson, Holte, Schaeffer, Schauenberg, Szafron, IJCAI-03

– Ours: Automated abstraction for large games• As computing speed increases, we can automatically take advantage

of it by simply rerunning the abstraction algorithm with a different parameter to produce a finer-grained abstraction

• We apply our techniques to Texas Hold’em poker, the most popular poker variant

Page 4: Andrew Gilpin  and  Tuomas Sandholm Carnegie Mellon University Computer Science Department
Page 5: Andrew Gilpin  and  Tuomas Sandholm Carnegie Mellon University Computer Science Department
Page 6: Andrew Gilpin  and  Tuomas Sandholm Carnegie Mellon University Computer Science Department
Page 7: Andrew Gilpin  and  Tuomas Sandholm Carnegie Mellon University Computer Science Department
Page 8: Andrew Gilpin  and  Tuomas Sandholm Carnegie Mellon University Computer Science Department
Page 9: Andrew Gilpin  and  Tuomas Sandholm Carnegie Mellon University Computer Science Department
Page 10: Andrew Gilpin  and  Tuomas Sandholm Carnegie Mellon University Computer Science Department
Page 11: Andrew Gilpin  and  Tuomas Sandholm Carnegie Mellon University Computer Science Department
Page 12: Andrew Gilpin  and  Tuomas Sandholm Carnegie Mellon University Computer Science Department

Computing equilibrium• In two-person zero-sum games,

– Nash equilibria are minimax equilibria, so there is no equilibrium selection problem

– Equilibrium can be found using LP• Any extensive form game (satisfying perfect recall) can be

converted into a matrix game– Create one pure strategy in the matrix game for every possible pure

contingency plan in the sequential game (cross product of actions at information sets)

– Leads to exponential blowup in number of strategies, even in the reduced normal form

• Sequence form: More compact representation based on sequences of moves rather than pure strategies [von Stengel 96, Koller & Megiddo 92, Romanovskii 62]– Two-person zero-sum games with perfect recall can be solved in time

polynomial in size of game tree– Not enough to solve Rhode Island Hold’em (3.1 billion nodes) or Texas

Hold’em (1018 nodes)

Page 13: Andrew Gilpin  and  Tuomas Sandholm Carnegie Mellon University Computer Science Department

Our prior work on automated abstraction [EC-06]

• Automatic method for performing abstractions in a broad class of sequential games of imperfect information

• Equilibrium-preserving game transformation, where certain information sets are merged and certain nodes within an information set are collapsed

• GameShrink, algorithm for identifying and applying all the game transformations– Õ(n2) time

• n = #nodes in the signal tree. In poker, these are possible card deals in the game

• Run-time tends to be highly sublinear in the size of the game tree• Used these techniques to solve Rhode Island Hold’em

– Largest poker game solved to date by over four orders of magnitude• Also developed approximate (lossy) version of GameShrink

– Uses a similarity metric on nodes in the signal tree (e.g., |#wins1 - #wins2| + |#losses1 - #losses2|) and a similarity threshold

Page 14: Andrew Gilpin  and  Tuomas Sandholm Carnegie Mellon University Computer Science Department

Example: Applying the ordered game isomorphic abstraction transformation

Page 15: Andrew Gilpin  and  Tuomas Sandholm Carnegie Mellon University Computer Science Department

Optimized approximate abstractions

• Original version of GameShrink yielded lopsided abstractions when used as an approximation algorithm

• Now we instead find an abstraction via clustering:• For each level of the tree (starting from root):

– For each group of hands:• use k-means clustering to split group i into ki abstract “states”

– win probability as the similarity metric (ties count as half a win)• for each value of ki, compute expected error (considering hand probs)

– We find, using integer programming, an abstraction (split of K into ki’s) that minimizes this expected error, subject to a constraint on the total number of states, K, at that level

• (=size of the resulting LP in the zero-sum case)• Solving this class of integer programs is quite easy in practice

Page 16: Andrew Gilpin  and  Tuomas Sandholm Carnegie Mellon University Computer Science Department

Application to Texas Hold’em

• Two-person game tree has ~1018 leaves– Too large to run lossless GameShrink– Even after that, LP would be too large

• Already too large when we applied this to first two rounds

• We split the 4 betting rounds into two phases– Phase I (first 3 rounds) solved offline using new

approximate version of GameShrink followed by LP– Phase II (last 2 rounds):

• abstractions computed offline• real-time equilibrium computation using updated hand

probabilities and anytime LP

Page 17: Andrew Gilpin  and  Tuomas Sandholm Carnegie Mellon University Computer Science Department

Phase I (first three rounds)• Payoffs at leaves computed assuming rollout for rest of the game• Automated abstraction using approximate version of GameShrink

– Round 1• There are 1,326 hands, of which 169 are strategically different• We consider 15 strategically different hands

– Round 2• There are 25,989,600 distinct possible hands• GameShrink (in lossless mode for Phase I) determines that there are about a

million strategically different hands• This is still too large to solve• We used GameShrink to compute an abstraction that considers 225

strategically different hands– Round 3

• There are 1,221,511,200 distinct possible hands• We consider 900 strategically different hands

– This process took about 3 days running on 4 CPUs• LP solve took 7 days and 80 gigabytes using CPLEX’s barrier

method (interior-point method for linear programming)

Page 18: Andrew Gilpin  and  Tuomas Sandholm Carnegie Mellon University Computer Science Department

Mitigating effect of round-based abstraction (i.e., having 2 phases)

• For leaves in the first phase, we could assume no betting in the later rounds

• Ignores implied odds• Can do better by estimating the amount of betting that

occurs in later rounds– Incorporate this information into the LP for the first phase

• For each possible hand strength and in each possible betting situation, we store the probability of each possible action– Mine the betting history in the later rounds from hundreds of

thousands of played hands

Page 19: Andrew Gilpin  and  Tuomas Sandholm Carnegie Mellon University Computer Science Department

Example of betting in fourth round

Player 1 has bet. Player 2 to fold, call, or raise

Page 20: Andrew Gilpin  and  Tuomas Sandholm Carnegie Mellon University Computer Science Department

Phase II (last two rounds)• Abstractions computed offline

– Betting history doesn’t matter => ( ) situations– Simple suit isomorphisms at the root of Phase II halves this – For each such setting, we use GameShrink to generate an abstraction with 10

and 100 strategically different hands in the last two rounds, respectively• Real-time equilibrium computation (using LP)

– So that our strategies are specific to particular hand (too many to precompute)– Updated hand probabilities from Phase I equilibrium using betting histories

and community card history: • si is player i’s strategy, h is an information set

– Conditional choice of primal vs. dual simplex• Achieve anytime capability for the player that is us

– Dealing with running off the equilibrium path

524

Page 21: Andrew Gilpin  and  Tuomas Sandholm Carnegie Mellon University Computer Science Department

Precompute several databases

• db5: possible wins and losses (for a single player) for every combination of two hole cards and three community cards (25,989,600 entries)– Used by GameShrink for quickly comparing the similarity of two hands

• db223: possible wins and losses (for both players) for every combination of pairs of two hole cards and three community cards based on a roll-out of the remaining cards (14,047,378,800 entries)– Used for computing payoffs of the Phase I game to speed up the LP

creation• handval: concise encoding of a 7-card hand rank used for fast

comparisons of hands (133,784,560 entries)– Used in several places, including in the construction of db5 and db223

• Colexicographical ordering used to compute indices into the databases allowing for very fast lookups

Page 22: Andrew Gilpin  and  Tuomas Sandholm Carnegie Mellon University Computer Science Department

Experimental results

• GS1: Game theory-based player, old version of manual abstraction, no strategy simulation in later rounds [GS 2006]

• Sparbot: Game theory-based player, manual abstraction [Billings et al 2003]

• Vexbot: Opponent modeling, miximax search with statistical sampling [Billings et al 2004]

Opponent Series won Win rate(small bets per 100)

GS1 38 of 50 +3.12Sparbot 28 of 50 +0.43Vexbot 32 of 50 -0.62

Page 23: Andrew Gilpin  and  Tuomas Sandholm Carnegie Mellon University Computer Science Department

Summary

• Competitive Texas Hold’em player automatically generated– First phase (rounds 1, 2 & 3): automated abstraction &

LP solved offline, using statistical data to compute payoffs at end of round 3

– Second phase (rounds 3 & 4): abstraction precomputed automatically; LP solved in real-time using updated hand probabilities and anytime

• Techniques are applicable to many sequential games of imperfect information

Page 24: Andrew Gilpin  and  Tuomas Sandholm Carnegie Mellon University Computer Science Department

Where to from here?• The top poker-playing programs are fairly equal• Recent experimental results show our player is

competitive with (but not better than) expert human players

• Provable approximation, e.g., ex post• Other types of abstraction• More scalable equilibrium-finding algorithms

• Tournament poker [e.g. Miltersen & Sørensen 06]• More than two players [e.g. Nash & Shapley 50]

Thank you