low-regret online decision-making via bellman inequalities · low-regret online decision-making via...
TRANSCRIPT
![Page 1: Low-regret Online Decision-making Via Bellman Inequalities · Low-regret Online Decision-making Via Bellman Inequalities Joint work with Sid Banerjee and Itai Gurvich. 2/36 Relaxations](https://reader035.vdocuments.site/reader035/viewer/2022062307/6055416ac2039a1f6e4c3e03/html5/thumbnails/1.jpg)
Low-regret Online Decision-making Via Bellman Inequalities
Joint work with Sid Banerjee and Itai Gurvich
![Page 2: Low-regret Online Decision-making Via Bellman Inequalities · Low-regret Online Decision-making Via Bellman Inequalities Joint work with Sid Banerjee and Itai Gurvich. 2/36 Relaxations](https://reader035.vdocuments.site/reader035/viewer/2022062307/6055416ac2039a1f6e4c3e03/html5/thumbnails/2.jpg)
2/36Relaxations and Regret Bounds for Online Problems
● Must make decisions upon request ● Uncertain process● Statistical information available● Goal: develop practical near optimal algorithms
![Page 3: Low-regret Online Decision-making Via Bellman Inequalities · Low-regret Online Decision-making Via Bellman Inequalities Joint work with Sid Banerjee and Itai Gurvich. 2/36 Relaxations](https://reader035.vdocuments.site/reader035/viewer/2022062307/6055416ac2039a1f6e4c3e03/html5/thumbnails/3.jpg)
3/36
Our Results
Relaxations and Regret Bounds for Online Problems
![Page 4: Low-regret Online Decision-making Via Bellman Inequalities · Low-regret Online Decision-making Via Bellman Inequalities Joint work with Sid Banerjee and Itai Gurvich. 2/36 Relaxations](https://reader035.vdocuments.site/reader035/viewer/2022062307/6055416ac2039a1f6e4c3e03/html5/thumbnails/4.jpg)
4/36
Our Results
Relaxations and Regret Bounds for Online Problems
Meta-Theorem For diferent resource allocation problems, we
give a practical policy, based on re-solving an optimization
program, with bounded .
The bound is independent of the horizon and capacities.
![Page 5: Low-regret Online Decision-making Via Bellman Inequalities · Low-regret Online Decision-making Via Bellman Inequalities Joint work with Sid Banerjee and Itai Gurvich. 2/36 Relaxations](https://reader035.vdocuments.site/reader035/viewer/2022062307/6055416ac2039a1f6e4c3e03/html5/thumbnails/5.jpg)
5/36
Our Results
Relaxations and Regret Bounds for Online Problems
Meta-Theorem For diferent resource allocation problems, we
give a practical policy, based on re-solving an optimization
program, with bounded .
The bound is independent of the horizon and capacities.
● Applications: Dynamic posted pricing, Online Knapsack, Network Revenue Management (Online Packing), Online Matching, Online Probing, Contextual Bandits
![Page 6: Low-regret Online Decision-making Via Bellman Inequalities · Low-regret Online Decision-making Via Bellman Inequalities Joint work with Sid Banerjee and Itai Gurvich. 2/36 Relaxations](https://reader035.vdocuments.site/reader035/viewer/2022062307/6055416ac2039a1f6e4c3e03/html5/thumbnails/6.jpg)
6/36
Our Results
Relaxations and Regret Bounds for Online Problems
Meta-Theorem For diferent resource allocation problems, we
give a practical policy, based on re-solving an optimization
program, with bounded .
The bound is independent of the horizon and capacities.
● Applications: Dynamic posted pricing, Online Knapsack, Network Revenue Management (Online Packing), Online Matching, Online Probing, Contextual Bandits
● Challenges: defne a benchmark and use it to design an algorithm
![Page 7: Low-regret Online Decision-making Via Bellman Inequalities · Low-regret Online Decision-making Via Bellman Inequalities Joint work with Sid Banerjee and Itai Gurvich. 2/36 Relaxations](https://reader035.vdocuments.site/reader035/viewer/2022062307/6055416ac2039a1f6e4c3e03/html5/thumbnails/7.jpg)
7/36
Why Constant Regret?
Relaxations and Regret Bounds for Online Problems
Case Study: edge weighted online matching
![Page 8: Low-regret Online Decision-making Via Bellman Inequalities · Low-regret Online Decision-making Via Bellman Inequalities Joint work with Sid Banerjee and Itai Gurvich. 2/36 Relaxations](https://reader035.vdocuments.site/reader035/viewer/2022062307/6055416ac2039a1f6e4c3e03/html5/thumbnails/8.jpg)
8/36
Why Constant Regret?
Relaxations and Regret Bounds for Online Problems
Case Study: edge weighted online matching
![Page 9: Low-regret Online Decision-making Via Bellman Inequalities · Low-regret Online Decision-making Via Bellman Inequalities Joint work with Sid Banerjee and Itai Gurvich. 2/36 Relaxations](https://reader035.vdocuments.site/reader035/viewer/2022062307/6055416ac2039a1f6e4c3e03/html5/thumbnails/9.jpg)
9/36
Why Constant Regret?
Relaxations and Regret Bounds for Online Problems
● Algorithms are diferent● Not worst case, but parametric
Case Study: edge weighted online matching
![Page 10: Low-regret Online Decision-making Via Bellman Inequalities · Low-regret Online Decision-making Via Bellman Inequalities Joint work with Sid Banerjee and Itai Gurvich. 2/36 Relaxations](https://reader035.vdocuments.site/reader035/viewer/2022062307/6055416ac2039a1f6e4c3e03/html5/thumbnails/10.jpg)
10/36
Problem 1: Online Knapsack
Relaxations and Regret Bounds for Online Problems
● Finite set of types:
● Known reward distribution and weight:
● Initial budget and horizon:
● Arrival process:
● Objective: collect as much reward as possible
![Page 11: Low-regret Online Decision-making Via Bellman Inequalities · Low-regret Online Decision-making Via Bellman Inequalities Joint work with Sid Banerjee and Itai Gurvich. 2/36 Relaxations](https://reader035.vdocuments.site/reader035/viewer/2022062307/6055416ac2039a1f6e4c3e03/html5/thumbnails/11.jpg)
11/36
Types of Benchmark
Relaxations and Regret Bounds for Online Problems
Number of type- arrivals
![Page 12: Low-regret Online Decision-making Via Bellman Inequalities · Low-regret Online Decision-making Via Bellman Inequalities Joint work with Sid Banerjee and Itai Gurvich. 2/36 Relaxations](https://reader035.vdocuments.site/reader035/viewer/2022062307/6055416ac2039a1f6e4c3e03/html5/thumbnails/12.jpg)
12/36
Types of Benchmark
Relaxations and Regret Bounds for Online Problems
Reward
Algorithm Optimal (DP) Prophet
Regret Number of type- arrivals
![Page 13: Low-regret Online Decision-making Via Bellman Inequalities · Low-regret Online Decision-making Via Bellman Inequalities Joint work with Sid Banerjee and Itai Gurvich. 2/36 Relaxations](https://reader035.vdocuments.site/reader035/viewer/2022062307/6055416ac2039a1f6e4c3e03/html5/thumbnails/13.jpg)
13/36
Online Packing
Relaxations and Regret Bounds for Online Problems
Theorem A natural policy with constant expected regret
for online packing problems. Regret independent of .
In particular, the regret depends only on
Generalizes to multiple resources and other arrival processes.
![Page 14: Low-regret Online Decision-making Via Bellman Inequalities · Low-regret Online Decision-making Via Bellman Inequalities Joint work with Sid Banerjee and Itai Gurvich. 2/36 Relaxations](https://reader035.vdocuments.site/reader035/viewer/2022062307/6055416ac2039a1f6e4c3e03/html5/thumbnails/14.jpg)
14/36
Online Packing
Relaxations and Regret Bounds for Online Problems
Theorem A natural policy with constant expected regret
for online packing problems. Regret independent of .
In particular, the regret depends only on
Generalizes to multiple resources and other arrival processes.
![Page 15: Low-regret Online Decision-making Via Bellman Inequalities · Low-regret Online Decision-making Via Bellman Inequalities Joint work with Sid Banerjee and Itai Gurvich. 2/36 Relaxations](https://reader035.vdocuments.site/reader035/viewer/2022062307/6055416ac2039a1f6e4c3e03/html5/thumbnails/15.jpg)
15/36
Online Packing
Relaxations and Regret Bounds for Online Problems
Theorem A natural policy with constant expected regret
for online packing problems. Regret independent of .
In particular, the regret depends only on
Generalizes to multiple resources and other arrival processes.
![Page 16: Low-regret Online Decision-making Via Bellman Inequalities · Low-regret Online Decision-making Via Bellman Inequalities Joint work with Sid Banerjee and Itai Gurvich. 2/36 Relaxations](https://reader035.vdocuments.site/reader035/viewer/2022062307/6055416ac2039a1f6e4c3e03/html5/thumbnails/16.jpg)
16/36
Online Packing
Relaxations and Regret Bounds for Online Problems
Theorem A natural policy with constant expected regret
for online packing problems. Regret independent of .
In particular, the regret depends only on
Generalizes to multiple resources and other arrival processes.
![Page 17: Low-regret Online Decision-making Via Bellman Inequalities · Low-regret Online Decision-making Via Bellman Inequalities Joint work with Sid Banerjee and Itai Gurvich. 2/36 Relaxations](https://reader035.vdocuments.site/reader035/viewer/2022062307/6055416ac2039a1f6e4c3e03/html5/thumbnails/17.jpg)
17/36
Online Packing
Relaxations and Regret Bounds for Online Problems
Theorem A natural policy with constant expected regret
for online packing problems. Regret independent of .
In particular, the regret depends only on
Generalizes to multiple resources and other arrival processes.
Similar results in a recent work for restricted cases [Bumpensanti & Wang]
![Page 18: Low-regret Online Decision-making Via Bellman Inequalities · Low-regret Online Decision-making Via Bellman Inequalities Joint work with Sid Banerjee and Itai Gurvich. 2/36 Relaxations](https://reader035.vdocuments.site/reader035/viewer/2022062307/6055416ac2039a1f6e4c3e03/html5/thumbnails/18.jpg)
18/36
Overview of the General Framework
Relaxations and Regret Bounds for Online Problems
Goal: Handle more general problems
![Page 19: Low-regret Online Decision-making Via Bellman Inequalities · Low-regret Online Decision-making Via Bellman Inequalities Joint work with Sid Banerjee and Itai Gurvich. 2/36 Relaxations](https://reader035.vdocuments.site/reader035/viewer/2022062307/6055416ac2039a1f6e4c3e03/html5/thumbnails/19.jpg)
19/36
Overview of the General Framework
Relaxations and Regret Bounds for Online Problems
Goal: Handle more general problems
![Page 20: Low-regret Online Decision-making Via Bellman Inequalities · Low-regret Online Decision-making Via Bellman Inequalities Joint work with Sid Banerjee and Itai Gurvich. 2/36 Relaxations](https://reader035.vdocuments.site/reader035/viewer/2022062307/6055416ac2039a1f6e4c3e03/html5/thumbnails/20.jpg)
20/36
Intuition
Relaxations and Regret Bounds for Online Problems
Given the additional information, Prophet wants to solve a DP
![Page 21: Low-regret Online Decision-making Via Bellman Inequalities · Low-regret Online Decision-making Via Bellman Inequalities Joint work with Sid Banerjee and Itai Gurvich. 2/36 Relaxations](https://reader035.vdocuments.site/reader035/viewer/2022062307/6055416ac2039a1f6e4c3e03/html5/thumbnails/21.jpg)
21/36
Intuition
Relaxations and Regret Bounds for Online Problems
Given the additional information, Prophet wants to solve a DP
![Page 22: Low-regret Online Decision-making Via Bellman Inequalities · Low-regret Online Decision-making Via Bellman Inequalities Joint work with Sid Banerjee and Itai Gurvich. 2/36 Relaxations](https://reader035.vdocuments.site/reader035/viewer/2022062307/6055416ac2039a1f6e4c3e03/html5/thumbnails/22.jpg)
22/36
Intuition
Relaxations and Regret Bounds for Online Problems
Bellman Loss (computational)
Given the additional information, Prophet wants to solve a DP
![Page 23: Low-regret Online Decision-making Via Bellman Inequalities · Low-regret Online Decision-making Via Bellman Inequalities Joint work with Sid Banerjee and Itai Gurvich. 2/36 Relaxations](https://reader035.vdocuments.site/reader035/viewer/2022062307/6055416ac2039a1f6e4c3e03/html5/thumbnails/23.jpg)
23/36
Intuition
Relaxations and Regret Bounds for Online Problems
Bellman Loss (computational)
Given the additional information, Prophet wants to solve a DP
![Page 24: Low-regret Online Decision-making Via Bellman Inequalities · Low-regret Online Decision-making Via Bellman Inequalities Joint work with Sid Banerjee and Itai Gurvich. 2/36 Relaxations](https://reader035.vdocuments.site/reader035/viewer/2022062307/6055416ac2039a1f6e4c3e03/html5/thumbnails/24.jpg)
24/36
Intuition
Relaxations and Regret Bounds for Online Problems
Bellman Loss (computational)
Information Loss (estimation)
Given the additional information, Prophet wants to solve a DP
![Page 25: Low-regret Online Decision-making Via Bellman Inequalities · Low-regret Online Decision-making Via Bellman Inequalities Joint work with Sid Banerjee and Itai Gurvich. 2/36 Relaxations](https://reader035.vdocuments.site/reader035/viewer/2022062307/6055416ac2039a1f6e4c3e03/html5/thumbnails/25.jpg)
25/36
Knapsack RABBI
Relaxations and Regret Bounds for Online Problems
![Page 26: Low-regret Online Decision-making Via Bellman Inequalities · Low-regret Online Decision-making Via Bellman Inequalities Joint work with Sid Banerjee and Itai Gurvich. 2/36 Relaxations](https://reader035.vdocuments.site/reader035/viewer/2022062307/6055416ac2039a1f6e4c3e03/html5/thumbnails/26.jpg)
26/36
Problem 2: Dynamic Posted Pricing
Relaxations and Regret Bounds for Online Problems
● Stream of T customers with i.i.d. rewards
● Each customer wants one of our identical items
● We can post any fare from the set
● Objective: collect as much reward as possible
Prophet solves:
?
![Page 27: Low-regret Online Decision-making Via Bellman Inequalities · Low-regret Online Decision-making Via Bellman Inequalities Joint work with Sid Banerjee and Itai Gurvich. 2/36 Relaxations](https://reader035.vdocuments.site/reader035/viewer/2022062307/6055416ac2039a1f6e4c3e03/html5/thumbnails/27.jpg)
27/36
Pricing RABBI
Relaxations and Regret Bounds for Online Problems
Fraction of customers that would buy when the fare is
![Page 28: Low-regret Online Decision-making Via Bellman Inequalities · Low-regret Online Decision-making Via Bellman Inequalities Joint work with Sid Banerjee and Itai Gurvich. 2/36 Relaxations](https://reader035.vdocuments.site/reader035/viewer/2022062307/6055416ac2039a1f6e4c3e03/html5/thumbnails/28.jpg)
28/36
Dynamic Posted Pricing
Relaxations and Regret Bounds for Online Problems
Theorem A natural policy with constant expected regret
for Dynamic Posted Pricing. Regret independent of .
In particular, the regret depends only on .
Fraction that buys at
![Page 29: Low-regret Online Decision-making Via Bellman Inequalities · Low-regret Online Decision-making Via Bellman Inequalities Joint work with Sid Banerjee and Itai Gurvich. 2/36 Relaxations](https://reader035.vdocuments.site/reader035/viewer/2022062307/6055416ac2039a1f6e4c3e03/html5/thumbnails/29.jpg)
29/36
Dynamic Posted Pricing
Relaxations and Regret Bounds for Online Problems
Theorem A natural policy with constant expected regret
for Dynamic Posted Pricing. Regret independent of .
In particular, the regret depends only on .
Fraction that buys at
![Page 30: Low-regret Online Decision-making Via Bellman Inequalities · Low-regret Online Decision-making Via Bellman Inequalities Joint work with Sid Banerjee and Itai Gurvich. 2/36 Relaxations](https://reader035.vdocuments.site/reader035/viewer/2022062307/6055416ac2039a1f6e4c3e03/html5/thumbnails/30.jpg)
30/36
Dynamic Posted Pricing
Relaxations and Regret Bounds for Online Problems
Theorem A natural policy with constant expected regret
for Dynamic Posted Pricing. Regret independent of .
In particular, the regret depends only on .
Fraction that buys at
![Page 31: Low-regret Online Decision-making Via Bellman Inequalities · Low-regret Online Decision-making Via Bellman Inequalities Joint work with Sid Banerjee and Itai Gurvich. 2/36 Relaxations](https://reader035.vdocuments.site/reader035/viewer/2022062307/6055416ac2039a1f6e4c3e03/html5/thumbnails/31.jpg)
31/36
The Algorithm is Practical
Relaxations and Regret Bounds for Online Problems
![Page 32: Low-regret Online Decision-making Via Bellman Inequalities · Low-regret Online Decision-making Via Bellman Inequalities Joint work with Sid Banerjee and Itai Gurvich. 2/36 Relaxations](https://reader035.vdocuments.site/reader035/viewer/2022062307/6055416ac2039a1f6e4c3e03/html5/thumbnails/32.jpg)
32/36
The Algorithm is Practical
Relaxations and Regret Bounds for Online Problems
![Page 33: Low-regret Online Decision-making Via Bellman Inequalities · Low-regret Online Decision-making Via Bellman Inequalities Joint work with Sid Banerjee and Itai Gurvich. 2/36 Relaxations](https://reader035.vdocuments.site/reader035/viewer/2022062307/6055416ac2039a1f6e4c3e03/html5/thumbnails/33.jpg)
33/36
Bound via Bellman Inequalities
Relaxations and Regret Bounds for Online Problems
Defnition Given fltration , is a relaxed value w.r.t. if
1) Initial Ordering:
2) Monotonicity:
![Page 34: Low-regret Online Decision-making Via Bellman Inequalities · Low-regret Online Decision-making Via Bellman Inequalities Joint work with Sid Banerjee and Itai Gurvich. 2/36 Relaxations](https://reader035.vdocuments.site/reader035/viewer/2022062307/6055416ac2039a1f6e4c3e03/html5/thumbnails/34.jpg)
34/36
Bound via Bellman Inequalities
Relaxations and Regret Bounds for Online Problems
Defnition Given fltration , is a relaxed value w.r.t. if
1) Initial Ordering:
2) Monotonicity:
![Page 35: Low-regret Online Decision-making Via Bellman Inequalities · Low-regret Online Decision-making Via Bellman Inequalities Joint work with Sid Banerjee and Itai Gurvich. 2/36 Relaxations](https://reader035.vdocuments.site/reader035/viewer/2022062307/6055416ac2039a1f6e4c3e03/html5/thumbnails/35.jpg)
35/36
Conclusions and Extensions
Relaxations and Regret Bounds for Online Problems
● Framework based on constructing tractable benchmarks● Bellman Loss: computational● Information Loss: estimation● Applications: NRM, Probing, Contextual Bandits,
AdWords, Dynamic Pricing, and other Resource Allocation Problems
![Page 36: Low-regret Online Decision-making Via Bellman Inequalities · Low-regret Online Decision-making Via Bellman Inequalities Joint work with Sid Banerjee and Itai Gurvich. 2/36 Relaxations](https://reader035.vdocuments.site/reader035/viewer/2022062307/6055416ac2039a1f6e4c3e03/html5/thumbnails/36.jpg)
36/36
Related Work
Relaxations and Regret Bounds for Online Problems
● Prophet: worst case distribution (competitive ratio) for maximum of iid [Hill & Kertz], best possible [Correa et al.], matroid constraints [Kleinberg & Weinberg]
● Constant regret in NRM: [Arlotto & Gurvich]
[Talluri & Van Ryzin], [Reiman & Wang], [Jasin & Kumar], [Bumpensanti & Wang]
● Online matching, resource allocation, AdWords[Manshadi et al], [Legrain & Jaillet]
● Probing: competitive ratio (linear regret) [Gupta & Nagarajan], [Singla], [Chugg & Maehara]
● Information Relaxation [Balseiro & Brown], [Brwon, Smith, & Sun] ● Approximate Dynamic Programming [Powell]