adaptively learning tolls to induce target flows aaron roth joint work with jon ullman and steven wu

Adaptively Learning Tolls to Induce Target

FlowsAaron Roth

Joint work with Jon Ullman and Steven Wu

Non-Atomic Congestion Games• A graph representing a road network

Non-Atomic Congestion Games• A graph representing a road network• A latency function on each edge• Infinitely many (infinitesimally small)

players• A mass of players who want to route

flow between and .

• Actions for players are paths in the graph• Action profile a multi-commodity flow.

ℓ 1(𝑥 )=

𝑥2

ℓ2 (𝑥 )=𝑥 3

+3

ℓ3 (𝑥 )=log 𝑥⋅ log log𝑥

ℓ 4(𝑥)=2

𝑥

ℓ5 (𝑥

)=1

Equilibrium Flows

• “Nobody can improve their total latency by switching paths”• Let denote the set of paths in the graph. • Let

Definition: A feasible multi-commodity flow is a Wardrop equilibrium if for every with , and for all paths with :

Routing games are potential games

• Equilibrium flows minimize the following potential function, among all feasible multi-commodity flows:

• Convex so long as are non-decreasing• Strongly convex if are strictly increasing equilibrium is unique.

Manipulating equilibrium flow (classic problem)• Suppose you can set tolls on each edge:

• Potential function becomes • Changes the equilibrium flow.

• Goal: Set tolls to induce some target flow in equilibrium• E.g. the socially optimal flow. • Always possible• Computationally tractable

A Natural Problem [Bhaskar, Ligett, Schulman, Swamy FOCS 14]

• You don’t know the latency functions…• But you have the power to set tolls and see what happens. • Agents play equilibrium flow given your tolls. • i.e. you have access to an oracle that takes toll vectors and returns , the

equilibrium flow given .

• Want to learn tolls that induce some target flow in polynomially many rounds.

The [BLSS] solution in a nutshell

• Assume latency functions are convex polynomials of fixed degree (the only thing unknown is the coefficients)• E.g.

• Write down a convex program to try and solve for coefficients and tolls that induce the target flow using Ellipsoid• i.e.

• Every day use tolls at the centroid of the ellipsoid• If , can find a separating hyperplane

• So: number of rounds to convergence running time of ellipsoid.

The [BLSS] solution in a nutshell

• Very neat! (Read the paper!)• A couple of limitations:• Latency functions must have a simple, known form: only unknowns are a

small number of coefficients. • Latency functions must be convex• Heavy machinery

• Have to run Ellipsoid. Computationally intensive. Centralized.

Some desiderata

• Remove assumptions on latency functions• No known parametric form• Not necessarily convex• Not necessarily Lipschitz

• Make update elementary• Ideally decentralized.

Proposed algorithm: Tatonnemont.

1. Initialize for all edges. Let .2. Observe equilibrium flow 3. While

1. For each edge set: 2. Set and

• Natural, simple, distributed.• Why should it work?

Reverse engineering why it should work• View the interaction as repeated play of a game between a toll player

and a flow player. • What game are they playing? • Flow player’s strategies are feasible flows, toll player’s strategies are tolls in

some bounded range• What are their cost functions?

• How are they playing it?

Behavior of the flow player is clear.

• Every day, the flow player plays:

• If we define the flow player’s cost to be:

then the flow player is playing a best response every day.

Behavior of the toll player?

• Consistent with playing online gradient descent with loss function:

• Which is the gradient of cost function:

So algorithm is consistent with:

• Repeated play of the following game:

• Where, in rounds:• The flow player updates his strategy with online gradient descent, and• The toll player best responds.

Questions

• Does the equilibrium of this game correspond to the target flow?• Does this repeated play converge (quickly) to equilibrium?

Questions

• Does the equilibrium of this game correspond to the target flow?• Yes

• Suppose tolls induce flow .

• Nash equilibrium.

• Suppose is a Nash equilibrium. • , else an edge with and toll player would set

• Thinking a little bit harder, if for , then setting sufficient

Questions

• Does this repeated play converge (quickly) to equilibrium? • It does in zero sum games! [FreundSchapire96?]

• Actual strategy of GD player, empirical average of BR player.

So it converges in a zero sum game.

Strategic Equivalence

• Adding a strategy-independent term to a player’s cost function does not change that player’s best response function• And so doesn’t change the equilibria of the game…


−∑𝑒∈𝐸

𝑔𝑒 𝑡𝑒


−∑𝑒∈𝐸

𝑔𝑒 𝑡𝑒

−∑𝑒∈𝐸

∫0

𝑓 𝑒

ℓ𝑒 (𝑥 )𝑑𝑥

So the dynamics converge!

So by the regret bound of OGD:Toll player reaches an -approximate min-max strategy in T rounds for:

Do approximate min-max tolls guarantee the approximate target flow? • Yes! Recall that if latency functions are strictly increasing, then is

strongly convex in for all .

Upshot

So: For any fixed class of latency functions such that:1. Each is bounded in the range 2. Each is strictly increasing

This simple process results in a flow such that in rounds. • Can get better bounds with further assumptions• E.g. are Lipschitz-continuous

Questions

• Exact convergence without assumptions on latency functions? • Extensions to other games?• Seem to crucially use the fact that equilibrium is the solution to a convex

optimization problem…

Questions

• Exact convergence without assumptions on latency functions? • Extensions to other games?• Seem to crucially use the fact that equilibrium is the solution to a convex

optimization problem… Thank You!

adaptively learning tolls to induce target flows aaron roth joint work with jon ullman and steven wu

Documents