regret to the best vs. regret to the average eyal even-dar computer and information science...

18
Regret to the Best vs. Regret to the Average Eyal Even-Dar Computer and Information Science University of Pennsylvania Collaborators: Michael Kearns (Penn) Yishay Mansour (Tel Aviv) Jenn Wortman (Penn)

Upload: sebastian-gill

Post on 28-Mar-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Regret to the Best vs. Regret to the Average Eyal Even-Dar Computer and Information Science University of Pennsylvania Collaborators: Michael Kearns (Penn)

Regret to the Bestvs.

Regret to the Average

Eyal Even-DarComputer and Information Science

University of Pennsylvania

Collaborators:Michael Kearns (Penn)

Yishay Mansour (Tel Aviv)Jenn Wortman (Penn)

Page 2: Regret to the Best vs. Regret to the Average Eyal Even-Dar Computer and Information Science University of Pennsylvania Collaborators: Michael Kearns (Penn)

• Learner maintains a weighting over N “experts”

• On each of T trials, learner observes payoffs for all K

– Payoff to the learner = weighted payoff

– Learner then dynamically adjusts weights

• Let Ri,T be cumulative payoff of expert i on some

sequence of T trials

• Let RA,T be cumulative payoff of learning algorithm A

• Classical no-regret results: We can produce a learning algorithm A such that on any sequence of trials,

RA,T > max{Ri,T} – sqrt(log(N)*T)

– “No regret”: per-trial regret sqrt(log(N)/T) approaches 0 as T grows

The No-Regret Setting

Page 3: Regret to the Best vs. Regret to the Average Eyal Even-Dar Computer and Information Science University of Pennsylvania Collaborators: Michael Kearns (Penn)

• We simultaneously examine:– Regret to best expert in hindsight– Regret to the average return of all experts

• Note that no learning is required to achieve just this!

• Why look at the average?– A “safety net” or “sanity check”– Simple algorithm outperforms– Future direction: S&P 500

• We assume a fixed horizon T– But this can easily be relaxed…

This Work

Page 4: Regret to the Best vs. Regret to the Average Eyal Even-Dar Computer and Information Science University of Pennsylvania Collaborators: Michael Kearns (Penn)

• Every difference based algorithm with regret O(Tα) to the best expert has Ω(T1-α) regret to the average

• There exists simple difference based algorithm achieving the tradeoff

• Every algorithm with O(T1/2) regret to the best expert

must have regret Ω(T1/2) regret to the average

• We can produce an algorithm with O(logT T1/2) regret to the best and O(1) regret to the average

Our Results

Page 5: Regret to the Best vs. Regret to the Average Eyal Even-Dar Computer and Information Science University of Pennsylvania Collaborators: Michael Kearns (Penn)

• Consider 2 experts with instantaneous gains in {0,1}• Let w be the weight on first expert and initialize w = ½• Suppose expert 1 gets a gain of 1 on the first time step,

and expert 2 gets a gain of 1 on the second…

Best, worst, and average all earn 1 Algorithm earns w + (1 – w – ) = 1 –

Regret to Best = Regret to Worst = Regret to Average =

w w

w + (1,0) (0,1)

Oscillations: The Cost of an Update

Page 6: Regret to the Best vs. Regret to the Average Eyal Even-Dar Computer and Information Science University of Pennsylvania Collaborators: Michael Kearns (Penn)

• Consider the following sequence– Expert 1: 1,0,1,0,1,0,1,0,…,1,0– Expert 2: 0,1,0,1,0,1,0,1,…,0,1

• We can examine w over time for existing algorithms…

• Follow the Perturbed Leader:½, ½ + 1/(T(1+ln(2))1/2- 1/2T, ½, ½ +1/(T(1+ln(2))1/2- 1/2T, ½, …

• Weighted Majority: ½, ½ + (ln(2)/2T)1/2/(1+(ln(2)/2T)1/2),

½, ½+(ln(2)/2T)1/2/(1+(ln(2)/2T)1/2), ½, ...

• Both will lose to best, worst, and average

A Bad Sequence

Page 7: Regret to the Best vs. Regret to the Average Eyal Even-Dar Computer and Information Science University of Pennsylvania Collaborators: Michael Kearns (Penn)

w = ½

w = ½ +

w = ½ +

w = 2/3

L steps, regretto best > L/3

Some t > 1/6L

……

T steps, regret to average ~ (T/2)*(1/6L) ~ (T/L)

• Again, consider 2 experts with instantaneous gains in {0,1}• Let w be the weight on first expert and initialize w = ½• Will first examine algorithms that depend only on cumulative difference in payoffs

– Insight holds more generally for aggressive updating

Regret to Best * Regret to Average ~ (T) !

(1,0)

(1,0)

(1,0)

(1,0)

A Simple Trade-off: The (T) Barrier

Page 8: Regret to the Best vs. Regret to the Average Eyal Even-Dar Computer and Information Science University of Pennsylvania Collaborators: Michael Kearns (Penn)

• Unnormalized weight on expert i at time t: wi,t = eηRi,t

• Define Wt=∑ wi,t, so we have pi,t = wi,t / Wt • Let N be the number of experts

• Setting η = O(1/T1/2) achieves O(T1/2) regret to the best

• Setting η = O(1/T1/2+α) achieves O(T1/2+α) regret to the best

• Can be shown that Setting η = O(1/T1/2+α) regret to the average is O(T1/2-α)

Exponential Weights [F94]

Page 9: Regret to the Best vs. Regret to the Average Eyal Even-Dar Computer and Information Science University of Pennsylvania Collaborators: Michael Kearns (Penn)

Regret to best ~ Tx

Regret to average ~ Ty

1/2

1/2

1

cumulative difference algorithms

So far…

Page 10: Regret to the Best vs. Regret to the Average Eyal Even-Dar Computer and Information Science University of Pennsylvania Collaborators: Michael Kearns (Penn)

• Any algorithm achieving O(T1/2) regret to best must suffer (T1/2) regret to average

• Any algorithm achieving O(log(T)T)1/2 regret to best must suffer (Tregret to the average

• Not restricted to cumulative difference algorithms!

Regret to best ~ Tx

Regret to average ~ Ty

1/2

1/2

1

all algorithms

cumulative difference algorithms

An Unrestricted Lower Bound

Page 11: Regret to the Best vs. Regret to the Average Eyal Even-Dar Computer and Information Science University of Pennsylvania Collaborators: Michael Kearns (Penn)

• Once again, 2 experts with instantaneous gains in {0,1}, w initialized to ½

• Let Dt be difference in cumulative payoffs of the two experts at time t

• The algorithm will make the following updates– If expert gains are (0,0) or (1,1): no change to w – If expert gains are (1,0): w w + – If expert gains are (0,1): w w –

• Assume we never reach w =1

• For any difference Dt = d we have w = ½ + d

A Simple Additive Algorithm

Page 12: Regret to the Best vs. Regret to the Average Eyal Even-Dar Computer and Information Science University of Pennsylvania Collaborators: Michael Kearns (Penn)

• While |Dt| < H– (0,0) or (1,1): no change to w – (1,0): w w + – (0,1): w w –

• Play EW with

Will analyze what happens: 1. If we stay in the loop 2. If we exit the loop

Breaking the (T) Barrier

Page 13: Regret to the Best vs. Regret to the Average Eyal Even-Dar Computer and Information Science University of Pennsylvania Collaborators: Michael Kearns (Penn)

While |D_t| < H– (0,0) or (1,1): no change to w – (1,0): w w + – (0,1): w w –

Observe Rbest,t - Ravg,T < H

Enough to compute regret to the average

Time t

Distance Dt

dd+1

w w

w + (1,0) (0,1)

Lose to Best & Average

Regret to the Average at most TRegret to the Best at most T

Staying in the Loop

Page 14: Regret to the Best vs. Regret to the Average Eyal Even-Dar Computer and Information Science University of Pennsylvania Collaborators: Michael Kearns (Penn)

While |D_t| < H– (0,0) or (1,1): no change to w – (1,0): w w + – (0,1): w w –

Play EW with

Upon exit from loop:– Regret to the best: still at most H + T– Gain over the average: ( ... H- T~ H2- T

• So e.g. H = T2/3 and = 1/T gives– Regret to best: < T2/3 in loop or upon exit– Regret to average: constant in loop; but gain T1/3 upon exit

• Now EW regret to the best T2/3 and to the average T1/3

w

w + (1,0)

Lose 1-wto BestGain w-½ over Average

Time t

dd+1

Exiting the Loop

Distance Dt

Page 15: Regret to the Best vs. Regret to the Average Eyal Even-Dar Computer and Information Science University of Pennsylvania Collaborators: Michael Kearns (Penn)

Regret to best ~ Tx

Regret to avg ~ Ty

1/2

1/2

1

all algorithms

cumulative difference algorithms

2/3

Page 16: Regret to the Best vs. Regret to the Average Eyal Even-Dar Computer and Information Science University of Pennsylvania Collaborators: Michael Kearns (Penn)

• Instead of playing additive algorithm inside the loop, we can play EW with η = Δ = 1/T

• Instead of having one phase, we can have many

Set η = 1/T, k = logT

For i = 1 to k– Reset and run EW with the current value of η until

Rbest,t – Ravg,t > H = O(T1/2)– Set η = η * 2

Reset and run EW with final value of η

Obliterating the (T) Barrier

Page 17: Regret to the Best vs. Regret to the Average Eyal Even-Dar Computer and Information Science University of Pennsylvania Collaborators: Michael Kearns (Penn)

• Known Extensions to Our Algorithm:– Instead of average, can use any static weight inside

the simplex

• Future Goals:– Nicer dependence on the number of experts

• Ours is O(logN), typically O(sqrt(logN))– Generalization to the returns setting and to other loss

functions

Extensions and Open Problems

Page 18: Regret to the Best vs. Regret to the Average Eyal Even-Dar Computer and Information Science University of Pennsylvania Collaborators: Michael Kearns (Penn)

Thanks!

Questions?