regret to the best vs. regret to the average eyal even-dar computer and information science...
TRANSCRIPT
![Page 1: Regret to the Best vs. Regret to the Average Eyal Even-Dar Computer and Information Science University of Pennsylvania Collaborators: Michael Kearns (Penn)](https://reader036.vdocuments.site/reader036/viewer/2022082917/5515da4955034638038b4944/html5/thumbnails/1.jpg)
Regret to the Bestvs.
Regret to the Average
Eyal Even-DarComputer and Information Science
University of Pennsylvania
Collaborators:Michael Kearns (Penn)
Yishay Mansour (Tel Aviv)Jenn Wortman (Penn)
![Page 2: Regret to the Best vs. Regret to the Average Eyal Even-Dar Computer and Information Science University of Pennsylvania Collaborators: Michael Kearns (Penn)](https://reader036.vdocuments.site/reader036/viewer/2022082917/5515da4955034638038b4944/html5/thumbnails/2.jpg)
• Learner maintains a weighting over N “experts”
• On each of T trials, learner observes payoffs for all K
– Payoff to the learner = weighted payoff
– Learner then dynamically adjusts weights
• Let Ri,T be cumulative payoff of expert i on some
sequence of T trials
• Let RA,T be cumulative payoff of learning algorithm A
• Classical no-regret results: We can produce a learning algorithm A such that on any sequence of trials,
RA,T > max{Ri,T} – sqrt(log(N)*T)
– “No regret”: per-trial regret sqrt(log(N)/T) approaches 0 as T grows
The No-Regret Setting
![Page 3: Regret to the Best vs. Regret to the Average Eyal Even-Dar Computer and Information Science University of Pennsylvania Collaborators: Michael Kearns (Penn)](https://reader036.vdocuments.site/reader036/viewer/2022082917/5515da4955034638038b4944/html5/thumbnails/3.jpg)
• We simultaneously examine:– Regret to best expert in hindsight– Regret to the average return of all experts
• Note that no learning is required to achieve just this!
• Why look at the average?– A “safety net” or “sanity check”– Simple algorithm outperforms– Future direction: S&P 500
• We assume a fixed horizon T– But this can easily be relaxed…
This Work
![Page 4: Regret to the Best vs. Regret to the Average Eyal Even-Dar Computer and Information Science University of Pennsylvania Collaborators: Michael Kearns (Penn)](https://reader036.vdocuments.site/reader036/viewer/2022082917/5515da4955034638038b4944/html5/thumbnails/4.jpg)
• Every difference based algorithm with regret O(Tα) to the best expert has Ω(T1-α) regret to the average
• There exists simple difference based algorithm achieving the tradeoff
• Every algorithm with O(T1/2) regret to the best expert
must have regret Ω(T1/2) regret to the average
• We can produce an algorithm with O(logT T1/2) regret to the best and O(1) regret to the average
Our Results
![Page 5: Regret to the Best vs. Regret to the Average Eyal Even-Dar Computer and Information Science University of Pennsylvania Collaborators: Michael Kearns (Penn)](https://reader036.vdocuments.site/reader036/viewer/2022082917/5515da4955034638038b4944/html5/thumbnails/5.jpg)
• Consider 2 experts with instantaneous gains in {0,1}• Let w be the weight on first expert and initialize w = ½• Suppose expert 1 gets a gain of 1 on the first time step,
and expert 2 gets a gain of 1 on the second…
Best, worst, and average all earn 1 Algorithm earns w + (1 – w – ) = 1 –
Regret to Best = Regret to Worst = Regret to Average =
w w
w + (1,0) (0,1)
Oscillations: The Cost of an Update
![Page 6: Regret to the Best vs. Regret to the Average Eyal Even-Dar Computer and Information Science University of Pennsylvania Collaborators: Michael Kearns (Penn)](https://reader036.vdocuments.site/reader036/viewer/2022082917/5515da4955034638038b4944/html5/thumbnails/6.jpg)
• Consider the following sequence– Expert 1: 1,0,1,0,1,0,1,0,…,1,0– Expert 2: 0,1,0,1,0,1,0,1,…,0,1
• We can examine w over time for existing algorithms…
• Follow the Perturbed Leader:½, ½ + 1/(T(1+ln(2))1/2- 1/2T, ½, ½ +1/(T(1+ln(2))1/2- 1/2T, ½, …
• Weighted Majority: ½, ½ + (ln(2)/2T)1/2/(1+(ln(2)/2T)1/2),
½, ½+(ln(2)/2T)1/2/(1+(ln(2)/2T)1/2), ½, ...
• Both will lose to best, worst, and average
A Bad Sequence
![Page 7: Regret to the Best vs. Regret to the Average Eyal Even-Dar Computer and Information Science University of Pennsylvania Collaborators: Michael Kearns (Penn)](https://reader036.vdocuments.site/reader036/viewer/2022082917/5515da4955034638038b4944/html5/thumbnails/7.jpg)
…
w = ½
w = ½ +
w = ½ +
w = 2/3
L steps, regretto best > L/3
Some t > 1/6L
……
…
T steps, regret to average ~ (T/2)*(1/6L) ~ (T/L)
• Again, consider 2 experts with instantaneous gains in {0,1}• Let w be the weight on first expert and initialize w = ½• Will first examine algorithms that depend only on cumulative difference in payoffs
– Insight holds more generally for aggressive updating
Regret to Best * Regret to Average ~ (T) !
(1,0)
(1,0)
(1,0)
(1,0)
A Simple Trade-off: The (T) Barrier
![Page 8: Regret to the Best vs. Regret to the Average Eyal Even-Dar Computer and Information Science University of Pennsylvania Collaborators: Michael Kearns (Penn)](https://reader036.vdocuments.site/reader036/viewer/2022082917/5515da4955034638038b4944/html5/thumbnails/8.jpg)
• Unnormalized weight on expert i at time t: wi,t = eηRi,t
• Define Wt=∑ wi,t, so we have pi,t = wi,t / Wt • Let N be the number of experts
• Setting η = O(1/T1/2) achieves O(T1/2) regret to the best
• Setting η = O(1/T1/2+α) achieves O(T1/2+α) regret to the best
• Can be shown that Setting η = O(1/T1/2+α) regret to the average is O(T1/2-α)
Exponential Weights [F94]
![Page 9: Regret to the Best vs. Regret to the Average Eyal Even-Dar Computer and Information Science University of Pennsylvania Collaborators: Michael Kearns (Penn)](https://reader036.vdocuments.site/reader036/viewer/2022082917/5515da4955034638038b4944/html5/thumbnails/9.jpg)
Regret to best ~ Tx
Regret to average ~ Ty
1/2
1/2
1
cumulative difference algorithms
So far…
![Page 10: Regret to the Best vs. Regret to the Average Eyal Even-Dar Computer and Information Science University of Pennsylvania Collaborators: Michael Kearns (Penn)](https://reader036.vdocuments.site/reader036/viewer/2022082917/5515da4955034638038b4944/html5/thumbnails/10.jpg)
• Any algorithm achieving O(T1/2) regret to best must suffer (T1/2) regret to average
• Any algorithm achieving O(log(T)T)1/2 regret to best must suffer (Tregret to the average
• Not restricted to cumulative difference algorithms!
Regret to best ~ Tx
Regret to average ~ Ty
1/2
1/2
1
all algorithms
cumulative difference algorithms
An Unrestricted Lower Bound
![Page 11: Regret to the Best vs. Regret to the Average Eyal Even-Dar Computer and Information Science University of Pennsylvania Collaborators: Michael Kearns (Penn)](https://reader036.vdocuments.site/reader036/viewer/2022082917/5515da4955034638038b4944/html5/thumbnails/11.jpg)
• Once again, 2 experts with instantaneous gains in {0,1}, w initialized to ½
• Let Dt be difference in cumulative payoffs of the two experts at time t
• The algorithm will make the following updates– If expert gains are (0,0) or (1,1): no change to w – If expert gains are (1,0): w w + – If expert gains are (0,1): w w –
• Assume we never reach w =1
• For any difference Dt = d we have w = ½ + d
A Simple Additive Algorithm
![Page 12: Regret to the Best vs. Regret to the Average Eyal Even-Dar Computer and Information Science University of Pennsylvania Collaborators: Michael Kearns (Penn)](https://reader036.vdocuments.site/reader036/viewer/2022082917/5515da4955034638038b4944/html5/thumbnails/12.jpg)
• While |Dt| < H– (0,0) or (1,1): no change to w – (1,0): w w + – (0,1): w w –
• Play EW with
Will analyze what happens: 1. If we stay in the loop 2. If we exit the loop
Breaking the (T) Barrier
![Page 13: Regret to the Best vs. Regret to the Average Eyal Even-Dar Computer and Information Science University of Pennsylvania Collaborators: Michael Kearns (Penn)](https://reader036.vdocuments.site/reader036/viewer/2022082917/5515da4955034638038b4944/html5/thumbnails/13.jpg)
While |D_t| < H– (0,0) or (1,1): no change to w – (1,0): w w + – (0,1): w w –
Observe Rbest,t - Ravg,T < H
Enough to compute regret to the average
Time t
Distance Dt
dd+1
w w
w + (1,0) (0,1)
Lose to Best & Average
Regret to the Average at most TRegret to the Best at most T
Staying in the Loop
![Page 14: Regret to the Best vs. Regret to the Average Eyal Even-Dar Computer and Information Science University of Pennsylvania Collaborators: Michael Kearns (Penn)](https://reader036.vdocuments.site/reader036/viewer/2022082917/5515da4955034638038b4944/html5/thumbnails/14.jpg)
While |D_t| < H– (0,0) or (1,1): no change to w – (1,0): w w + – (0,1): w w –
Play EW with
Upon exit from loop:– Regret to the best: still at most H + T– Gain over the average: ( ... H- T~ H2- T
• So e.g. H = T2/3 and = 1/T gives– Regret to best: < T2/3 in loop or upon exit– Regret to average: constant in loop; but gain T1/3 upon exit
• Now EW regret to the best T2/3 and to the average T1/3
w
w + (1,0)
Lose 1-wto BestGain w-½ over Average
Time t
dd+1
Exiting the Loop
Distance Dt
![Page 15: Regret to the Best vs. Regret to the Average Eyal Even-Dar Computer and Information Science University of Pennsylvania Collaborators: Michael Kearns (Penn)](https://reader036.vdocuments.site/reader036/viewer/2022082917/5515da4955034638038b4944/html5/thumbnails/15.jpg)
Regret to best ~ Tx
Regret to avg ~ Ty
1/2
1/2
1
all algorithms
cumulative difference algorithms
2/3
![Page 16: Regret to the Best vs. Regret to the Average Eyal Even-Dar Computer and Information Science University of Pennsylvania Collaborators: Michael Kearns (Penn)](https://reader036.vdocuments.site/reader036/viewer/2022082917/5515da4955034638038b4944/html5/thumbnails/16.jpg)
• Instead of playing additive algorithm inside the loop, we can play EW with η = Δ = 1/T
• Instead of having one phase, we can have many
Set η = 1/T, k = logT
For i = 1 to k– Reset and run EW with the current value of η until
Rbest,t – Ravg,t > H = O(T1/2)– Set η = η * 2
Reset and run EW with final value of η
Obliterating the (T) Barrier
![Page 17: Regret to the Best vs. Regret to the Average Eyal Even-Dar Computer and Information Science University of Pennsylvania Collaborators: Michael Kearns (Penn)](https://reader036.vdocuments.site/reader036/viewer/2022082917/5515da4955034638038b4944/html5/thumbnails/17.jpg)
• Known Extensions to Our Algorithm:– Instead of average, can use any static weight inside
the simplex
• Future Goals:– Nicer dependence on the number of experts
• Ours is O(logN), typically O(sqrt(logN))– Generalization to the returns setting and to other loss
functions
Extensions and Open Problems
![Page 18: Regret to the Best vs. Regret to the Average Eyal Even-Dar Computer and Information Science University of Pennsylvania Collaborators: Michael Kearns (Penn)](https://reader036.vdocuments.site/reader036/viewer/2022082917/5515da4955034638038b4944/html5/thumbnails/18.jpg)
Thanks!
Questions?