regret minimization in bounded memory games

43
REGRET MINIMIZATION IN BOUNDED MEMORY GAMES Jeremiah Blocki Nicolas Christin Anupam Datta Arunesh Sinha Work in progress

Upload: komala

Post on 24-Feb-2016

51 views

Category:

Documents


0 download

DESCRIPTION

Regret Minimization in Bounded Memory Games. Jeremiah Blocki Nicolas Christin Anupam Datta Arunesh Sinha. Work in progress. Motivating Example. Employee Actions: {Behave, Violate}. Audit Process Example. Reward Organization: -6 - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Regret Minimization in Bounded Memory Games

REGRET MINIMIZATION IN BOUNDED MEMORY GAMESJeremiah BlockiNicolas ChristinAnupam DattaArunesh Sinha

Work in progress

Page 2: Regret Minimization in Bounded Memory Games

MOTIVATING EXAMPLE

2

Employee Actions: {Behave, Violate}

Page 3: Regret Minimization in Bounded Memory Games

AUDIT PROCESS EXAMPLE

3

Behave ViolateIgnore 0 -5Investigate -1 -1

Game proceeds in rounds

EmployeeOrganizationExpertOutcome:Reward:

Round 1BehaveIgnoreInvestigateNo Violation0

Round 2ViolateIgnoreInvestigateMissed V-5

Round 3ViolateInvestigateInvestigateDetected V-1

…………

Reward Organization: -6Rounds: 3Reward of Best Expert (hindsight): -3Regret : (-3) -(-6) = 3Average Regret: 1

Page 4: Regret Minimization in Bounded Memory Games

TALK OUTLINEMotivation Bounded Memory Game Model Defining Regret Regret Minimization in Bounded Memory

Games Feasibility Complexity

4

Page 5: Regret Minimization in Bounded Memory Games

ELEMENTS OF GAME MODEL Two Players:

Adversary (Employee) and Defender (Organization) Actions:

Adversary Actions: {Violate, Behave} Defender Actions: {Investigate, Ignore}

Repeated Interactions Each interaction has an outcome History of game play is a sequence of outcomes

Imperfect Information: The organization doesn’t always observe the

actions of the employee5

Could be formalized as a repeated game

Page 6: Regret Minimization in Bounded Memory Games

ADDITIONAL ELEMENTS OF GAME MODEL Move to richer game model

o History-dependent Rewards:o Save money by ignoringo Reputation possibly damaged if we Ignore and the

employee did violateo Reputation of the organization depends both on its

history and on the current outcome

o History-dependent Actions:o Players’ behavior may depend on historyo Defender’s behavior may depend on complete history 6

Page 7: Regret Minimization in Bounded Memory Games

ADVERSARY MODELSAdversary behavior depends only on history he

remembers

Fully Oblivious Adversaries – only remembers the round number

k-Adaptive Adversary – remembers the round number, but history is reset every k turns

Adaptive Adversary – remembers complete game history 7

Page 8: Regret Minimization in Bounded Memory Games

GOAL Define game model Define notions of regret for defender in game

model wrt different adversary models Study complexity of regret minimization

problem in this game model

8

Page 9: Regret Minimization in Bounded Memory Games

PRIOR WORK: REGRET MINIMIZATION FOR GAME MODELS

Regret Minimization well studied in repeated games including imperfect information (bandit model)

[AK04, McMahanB04,K05,FKM05, DH06]

Defender compares his performance to the performance of the best expert in hindsight. Traditional: for the sake of comparison we

assume that the adversary was oblivious

9

Page 10: Regret Minimization in Bounded Memory Games

TALK OUTLINE MotivationBounded Memory Game Model Defining Regret Regret Minimization in Bounded Memory

Games

10

Page 11: Regret Minimization in Bounded Memory Games

TWO PLAYER STOCHASTIC GAMES

11

Transitions between states can be probabilistic, and may depend on the actions (a,b) taken by Defender and Adversary.

r(a,b,s) - Payoff when action a,b is played at state s

Two Players: Defender and Adversary

Page 12: Regret Minimization in Bounded Memory Games

STOCHASTIC GAMES Captures dependence of rewards on history

A fixed strategy is a function f mapping S (states) to actions A Experts: Set of fixed strategies Captures dependence of defender’s actions on

history

12

Recall: Additional elements of game model in motivating example

Page 13: Regret Minimization in Bounded Memory Games

NOTATION Number of States in the Game (n)

Number of Actions (|A|)

Number of Experts (N) N = |A|n

Total rounds of play T

13

Page 14: Regret Minimization in Bounded Memory Games

THEOREM 1: REGRET MINIMIZATION IS IMPOSSIBLE FOR STOCHASTIC GAMES

14Oblivious Strategy i: Play b2 i times then play b1

Optimal Defender Strategy: Play a1 every round

Oblivious Strategy: Play b2 every turn

Optimal Defender Strategy: Play a2 every round

Page 15: Regret Minimization in Bounded Memory Games

OUR GAME MODEL Definition of bounded memory games, a

subclass of stochastic games

Memory-m games States record last m outcomes in the history of

the game play

15

Page 16: Regret Minimization in Bounded Memory Games

BOUNDED MEMORY GAME: STATES & ACTIONS

16

Four states – record last outcome (memory-1)

Defender Actions: {UP, DOWN}Adversary Actions: {LEFT, RIGHT}

Page 17: Regret Minimization in Bounded Memory Games

BOUNDED MEMORY GAME: OUTCOMES

17

Four Outcomes:1. (Up, Left)2. (Up, Right)3. (Down, Left)4. (Down, Right)

• The outcome depends only on the current actions of defender and adversary.

• It is independent of the current state

Page 18: Regret Minimization in Bounded Memory Games

BOUNDED MEMORY GAME: OUTCOMES

18

Four States:1. (Top, Left)2. (Top, Right)3. (Bottom, Left)4. (Bottom Right)

Page 19: Regret Minimization in Bounded Memory Games

BOUNDED MEMORY GAME: EXAMPLE GAME PLAY

19

RoundStateDefenderAdversaryOutcome

Round 3Bottom, Right………

Round 2Bottom,LeftDownRightBottom,Right

Round 1Top, LeftDownLeft(Bottom,Left)

Page 20: Regret Minimization in Bounded Memory Games

BOUNDED MEMORY GAMES: REWARDS

Defender sees reward 1 if Adversary plays LEFT from a green state Adversary plays RIGHT from a blue state

20

Page 21: Regret Minimization in Bounded Memory Games

TRADITIONAL REGRET IN BOUNDED MEMORY GAMES

Adversary strategy Plays RIGHT from a green state Plays LEFT from a blue state

21

• Defender will never see a reward!

• In hindsight, it may look like the fixed strategy “always play UP” would have received a reward

• What view of regret makes sense?

Page 22: Regret Minimization in Bounded Memory Games

TRADITIONAL REGRET IN BOUNDED MEMORY GAMES

RoundStateDefenderAdversaryReward

22

FiveTop, Left………

FourBottom, LeftUPLeft0

ThreeBottom,LeftDownLeft0

TwoTop, RightDownLeft0

OneTop,LeftUPRight0

Page 23: Regret Minimization in Bounded Memory Games

TRADITIONAL REGRET IN BOUNDED MEMORY GAMES

RoundStateUPAdversaryReward

23

FiveTop, Left………

FourBottom, LeftUPLeft0

ThreeBottom,LeftDownLeft0

TwoTop, RightDownLeft0

OneTop,LeftUPRight0

OneTop,LeftUPRight0

TwoTop, RightUPLeft0

ThreeTop,LeftUPLeft1

FourTop,LeftUPLeft1

• Player A will never see a reward!

• In hindsight, it looks like the fixed strategy “always play UP” would have received reward 2

• What are other ways to compare our performance?

Page 24: Regret Minimization in Bounded Memory Games

COMPARING PERFORMANCE WITH THE EXPERTS Actual Game

Hypothetical Game (Fixed Strategy f)

Compare performance of defender in actual game to performance of f in hypothetical game

24

Defender Real Adversary

f Hypothetical Adversary

Page 25: Regret Minimization in Bounded Memory Games

REGRET MODELS

25

Actual AdversaryHypothetical Adversary

Oblivious k-Adaptive Adaptive

Oblivious k-Adaptive _________Adaptive _________ _________

Hypothetical Oblivious Adversary – hard code moves played by actual adversary

Hypothetical k-Adaptive Adversary – hard code state of actual adversary after each window of k moves.

Page 26: Regret Minimization in Bounded Memory Games

REGRET IN BOUNDED MEMORY GAMES (REVISITED)

26

• In hindsight, our hypothetical

adversary model is k-Adaptive (k=5)

• What is our regret now?

……

………

FiveTop,LeftUpRight0

FourTop, RightUpLeft0

ThreeTop, LeftUpRight0

TwoTop, ‘RightUpLeft0

OneTop, LeftUpRight0

Round One Two Three Four Five …State Top,

LeftTop, Right Bottom,

LeftTop, Left

Bottom,Right

Defender Up Down Down Down Up …Adversary

Right Left Left Left Right …

Reward 0 0 0 0 0 …

RoundState

UPDefenderReward

Actual Performance: 0Performance of Expert: 0Regret: 0

Page 27: Regret Minimization in Bounded Memory Games

MEASURING REGRET IN HINDSIGHT View 1: hypothetical adversary is oblivious

The adversary would have played the exact same moves played in the actual game.

Traditional View of Regret: Repeated Games [Blackwell56,Hannan57,LW89], etc… Impossible for Bounded Memory Games (Example)

View 2: hypothetical adversary is k-Adaptive The adversary would have used the exact same

strategy during each window of k-moves.

View 3: hypothetical adversary fully adaptive The hypothetical adversary is the real adversary.

27

Page 28: Regret Minimization in Bounded Memory Games

REGRET MINIMIZATION ALGORITHMS Regret Minimization Algorithm:

Average Regret 0 as T

Examples for repeated games:Weighted Majority Algorithm [LW89]:

Average Regret: O((log N)/T)½

[ACFS02] Bandit Setting:Average Regret: O(((N log N)/T)½)

28

Page 29: Regret Minimization in Bounded Memory Games

REGRET MINIMIZATION IN REPEATED GAMES

29Easy consequence of Theorem 2

Actual AdversaryHypothetical Adversary

Oblivious k-Adaptive

Adaptive

Oblivious

k-Adaptive _

Adaptive _ _ X

Page 30: Regret Minimization in Bounded Memory Games

REGRET MINIMIZATION IN STOCHASTIC GAMES

30

Theorem 1: No Regret Minimization Algorithm exists for the general class of stochastic games.

Actual AdversaryHypothetical Adversary

Oblivious k-Adaptive

Adaptive

Oblivious X X X

k-Adaptive _ X X

Adaptive _ _ X

Page 31: Regret Minimization in Bounded Memory Games

THEOREM 1: REGRET MINIMIZATION IS IMPOSSIBLE FOR STOCHASTIC GAMES

31Oblivious Strategy i: Play b2 i times then play b1

Optimal Defender Strategy: Play a1 every round

Oblivious Strategy: Play b2 every turn

Optimal Defender Strategy: Play a2 every round

Page 32: Regret Minimization in Bounded Memory Games

REGRET MINIMIZATION IN BOUNDED MEMORY GAMES

Theorem 2 (Time Permitting)32

Actual AdversaryHypothetical Adversary

Oblivious k-Adaptive

Adaptive

Oblivious X X

k-Adaptive _

Adaptive _ _ X

Page 33: Regret Minimization in Bounded Memory Games

REGRET MINIMIZATION IN BOUNDED MEMORY GAMES

Theorem 3: Unless RP = NP there is no efficient regret minimization algorithm for the general class of bounded memory games. 33

Actual AdversaryHypothetical Adversary

Oblivious k-Adaptive

Adaptive

Oblivious Hard X X

k-Adaptive _ Hard Hard

Adaptive _ _ X

Page 34: Regret Minimization in Bounded Memory Games

THEOREM 3Unless RP=NP there is no efficient RegretMinimization algorithm for Bounded Memory

Games even against an oblivious adversary.

Reduction from MAX 3-SAT (7/8+ε) [Hastad01] Similar to reduction in [EKM05] for MDP’s

34

Page 35: Regret Minimization in Bounded Memory Games

THEOREM 3: SETUP

Defender Actions A: {0,1}x{0,1}

m = O(log n)

States: Two states for each variable S0 = {s1,…, sn} S1 = {s’1,…,s’n}

Intuition: A fixed strategy corresponds to a variable assignment 35

Page 36: Regret Minimization in Bounded Memory Games

THEOREM 3: OVERVIEW The adversary picks a clause uniformly at

random for the next n rounds

Defender can earn reward 1 by satisfying this unknown clause in the next n rounds

The game will “remember” if a reward has already been given so that defender cannot earn a reward multiple times during n rounds

36

Page 37: Regret Minimization in Bounded Memory Games

THEOREM 3: STATE TRANSITIONS

37

Adversary Actions B: {0,1}x{0,1,2,3}

b = (b1,b2)

g(a,b) = b1

f(a,b) = S1 if a2 = 1 or b2 = a1 (reward already given) S0 else (no reward given)

Page 38: Regret Minimization in Bounded Memory Games

THEOREM 3: REWARDS

38

b = (b1,b2)

No reward whenever B plays b2 = 2

r(a,b,s) =

1 if s S0 and a = b2

-5 if s S1 and f(a,b) = S0 and b2 3 0 otherwise

No reward whenever s S1

Page 39: Regret Minimization in Bounded Memory Games

THEOREM 3: OBLIVIOUS ADVERSARY(d1,…,dn) - binary De Buijn sequence of order

n

1. Pick a clause C uniformly at random2. For i = 1,…,n

Play b = (di,b2)

3. Repeat Step 139

b2 =

1 If xi C0 If xi C3 If i = n2 Otherwise

Page 40: Regret Minimization in Bounded Memory Games

ANALYSIS

Defender can never be rewarded from s S1 Get Reward => Transition to s S1 Defender is punished for leaving S1

Unless adversary plays b2 = 3 (i.e when i = n)40

f(a,b) = S1 if a2 = 1 or b2 = a1

S0 else

r(a,b,s)=

1 if s S0 and a = b2

-5 if s S1 and f(a,b) = S0 and b2 3 0 otherwise

Page 41: Regret Minimization in Bounded Memory Games

THEOREM 3: ANALYSIS φ - assignment satisfying ρ fraction of clauses

fφ – average score ρ/n

Claim: No strategy (fixed or adaptive) can obtain an average expected score better than ρ*/n

Regret Minimization Algorithm Run until expected average regret < ε/n Expected average score > (ρ*- ε )/n

41

Page 42: Regret Minimization in Bounded Memory Games

OPEN QUESTIONS How hard is Regret Minimization against an

oblivious adversary when A = {0,1} and m = O(log n)?

How hard is Regret Minimization against a oblivious adversary in the complete information model?

42

Page 43: Regret Minimization in Bounded Memory Games

QUESTIONS?

43