bayesian and non-bayesian learning in games ehud lehrer tel aviv university, school of mathematical...

Bayesian and non-Bayesian Learning in Games

Ehud Lehrer

Tel Aviv University, School of Mathematical Sciences

Including joint works with: Ehud Kalai, Rann Smorodinsky, Eioln Solan.

Learning in GamesInformal definition of learning: a decentralized process thatconverges (in some sense) to (some) equilibrium.

Non-Bayesian learning: Players• don’t have any initial belief about other players’ strategies • don’t maximize their payoffs • don’t take into account future payoffsConvergence (of the empirical frequency) to an equilibrium of the ONE-SHOT GAME

Bayesian (rational) learning: Players do not start in equilibrium,

but • they have some initial belief about other players’ strategies • they are rational: they maximize their payoffs • they take into account future payoffs

Convergence in REPEATED GAME

Bayesian vs. non-Bayesian

Non-Bayesian learning: Players have no idea about other players’ actions. They don’t care to maximize payoffs.

Nature of results: the statistics of past actions looks like an

equilibrium of the one-shot game.

Bayesian learning: Players do not start in equilibrium, but they

start with a “grain” of idea about what other players do.

Nature of results: players eventually play something close to an

equilibrium of the repeated game.

Important tools

Non-Bayesian learning: approachability

Bayesian learning: merging of two probability measures along a

a filtration (an increasing sequence of - fields)

Both were initiated by Blackwell (the first with Dubins)

Repeated Games with Vector Payoffs• I = finite set of actions of player 1.• J = finite set of actions of player 2.• M = (mi,j) = a payoff matrix. Entries are vectors in Rd.

A set F is approachable by player 1 if there is a strategy s.t.

There are sets which are neither approachable nor excludable.

,, , , , sup ( , )nn N

N P d x F

A set F is excludable by player 2 if there is a strategy s.t.

,, , , , inf ( , )nn NN P d x F

Approachability

Applications (a sample): • No-regret (Hannan)• Repeated games with incomplete information (Aumann-Maschler)• Learning (Foster-Vohra, Hart-Mas Colell)• Manipulation of calibration tests (Foster-Vohra, Lehrer, Smorodinsky-Sandroni-Vohra)• Generating generalized normal-number (Lehrer)

Characterization of Approachable Sets

F

x

y

A closed set F Rd is a B-set if for every x F there is y F that satisfies:

1. y is a closest point in F to x.2. The hyperplane perpendicular to the line xy that passes through y

separates between x and H(p), for some p (I).

the line xy

the hyperplane perpendicular to xy that passes through y

mp,q = i,j pi mi,j qj H(p) = { mp,q , q (I) }

H(p0)

Characterization of Approachable Sets

Theorem [Blackwell, 1956]: every B-set F is approachable.

Theorem [Blackwell, 1956]: every convex set is either approachable or excludable.

Theorem [Hou, 1971; Spinat, 2002]: every minimal (w.r.t. set inclusion) approachable set is a B-set.Or: A set is approachable if and only if it contains a B-set.

The approaching strategy plays at each stage n the mixed action p such that H(p) and x are separated by the hyperplane connecting x and a closest point to x in F. With this strategy: 2 | |

( , )n

ME d x F

n

Bounded Computational Capacity

A strategy is k-bounded-recall if it depends only on the last k pairs of actions (and it does not depend on previously played actions).

A (non-deterministic) automaton is given by:• A finite state space.• A probability distribution over states, according to which the initial state is chosen.• A set of inputs (say, the set I × J of action pairs).• A set of outputs (say, I , the set of player 1’s actions).• A rule that assigns to each state a probability distribution over outputs.• A transition rule that assigns to every state and every input a probability distribution over the next state.

Approachability and Bounded Capacity

Theorem (w/ Eilon Solan): The following statements are equivalent.1. The set F is approachable with bounded-recall strategies.2. The set F is approachable with automata.3. The set F contains a convex approachable set.4. The set F is not excludable against bounded-recall strategies.

A set F is approachable with bounded-recall strategies by player 1 if for every >0, the set B(F, ) := { y : d(y, F) } is approachable by some bounded-recall strategy.

4 points to note

A set F is excludable against bounded-recall strategies by player 2 if player 2 has a strategy such that

,, , bounded-recall , , inf ( , )nn N

N P d x F

Theorem: The following statements are equivalent for closed sets.1. The set F is approachable with bounded-recall strategies.2. The set F is approachable with automata.3. The set F contains a convex approachable set.4. The set F is not excludable against bounded-recall strategies.

Main Theorem

1. A set is approachable with automata if and only if it is approachable by bounded-recall strategies.

2. A complete characterization of sets that are approachable with bounded-recall strategies.

3. A set which is not approachable with bounded-recall strategies, is excludable against all bounded-recall strategies.

4. We do not know whether the same holds for automata.

Example

(-1,1)(1,-1)

)1,1((-1,-1)On board

Good news: in applications target sets are convex ( a point or a whole -- positive or negative -- orthant).

Advantage: allows for infinitely many constraints

Approachability in Hilbert space • I = finite set of actions of player 1.• J = finite set of actions of player 2.• M = (mi,j) = a payoff matrix. Entries are points in HS (random variables).All may change with the stage n.

A set F is approachable by player 1 if there is a strategy s.t.

,, , , , sup ( , )nn N

N P d x F

Theorem: Suppose that at stage n, the average payoff is and y is a closest point in F to . If the hyperplane perpendicular to theline that passes through y separates between and H(p), for some p (I), then F is approachable.

nxnx

nx ynx

Approachability and law of large numbers

F is

1 2, ,...X X are uncorrelated r.v.’s with . ( ) 0iE X ( )i jE X X is the dot product.

0

At any stage n, . 1( ) 0n nE X X

F

nX

1nX

The game: each players has only one action. The payoff at stage n is . Thus, F is approachable. This is the strong law of large numbers. (When the payoffs are not uniformly bounded, there is anadditional boundedness condition.)

nX

Problem: Approachability in norm spaces.

The average payoff at stage n is

Activeness function

At stage n the characteristic function indicates which coordinates are active and which are not.

H is (even over a finite probability space). 2L

nK

1

1

n

t tt

n n

tt

K XX

K

Applications: 1. repeated games with incomplete information – different games are active on different times 2. construction of normal numbers 3. manipulability of many calibration tests 4. general no-regret theorem (against many replacing schemes) 5. convergence to correlated eq. along many sequences

Theorem: suppose that F is convex. Let be the closest point in F tothe average payoff at time n, . If the hyperplane perpendicular to theline

that passes through separates between and H(p), for some p (I), then F is approachable.

Activeness function – cont.

ny

1

1

( )nn nn

tt

KX y

K

ny nX

nX

bayesian and non-bayesian learning in games ehud lehrer tel aviv university, school of mathematical...

Documents