online algorithms for weighted paging with predictionsdebmalya/papers/icalp20-paging.pdf · 1...

Online Algorithms for Weighted Paging with1

Predictions2

Zhihao Jiang13

Tsinghua University, China4

Debmalya Panigrahi5

Duke University, USA6

Kevin Sun7

Duke University, USA8

Abstract9

In this paper, we initiate the study of the weighted paging problem with predictions. This continues10

the recent line of work in online algorithms with predictions, particularly that of Lykouris and11

Vassilvitski (ICML 2018) and Rohatgi (SODA 2020) on unweighted paging with predictions. We12

show that unlike unweighted paging, neither a fixed lookahead nor knowledge of the next request13

for every page is sufficient information for an algorithm to overcome existing lower bounds in14

weighted paging. However, a combination of the two, which we call the strong per request prediction15

(SPRP) model, suffices to give a 2-competitive algorithm. We also explore the question of gracefully16

degrading algorithms with increasing prediction error, and give both upper and lower bounds for a17

set of natural measures of prediction error.18

2012 ACM Subject Classification Theory of computation → Online algorithms19

Keywords and phrases Online algorithms, paging20

Digital Object Identifier 10.4230/LIPIcs.ICALP.2020.6921

Funding Debmalya Panigrahi: Supported in part by NSF grants CCF-1535972, CCF-1955703, an22

NSF CAREER Award CCF-1750140, and the Indo-US Virtual Networked Joint Center on Algorithms23

under Uncertainty.24

Kevin Sun: Supported in part by NSF grants CCF-1535972, CCF-1527084, CCF-1955703, and an25

NSF CAREER Award CCF-1750140.26

1 Work done while visiting Duke University.

© Zhihao Jiang, Debmalya Panigrahi, and Kevin Sun;licensed under Creative Commons License CC-BY

47th International Colloquium on Automata, Languages, and Programming (ICALP 2020).Editors: Artur Czumaj, Anuj Dawar, and Emanuela Merelli; Article No. 69; pp. 69:1–69:18

Leibniz International Proceedings in InformaticsSchloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany

https://doi.org/10.4230/LIPIcs.ICALP.2020.69

https://creativecommons.org/licenses/by/3.0/

https://www.dagstuhl.de/lipics/

https://www.dagstuhl.de

69:2 Online Algorithms for Weighted Paging with Predictions

1 Introduction27

The paging problem is among the most well-studied problems in online algorithms. In this28

problem, there is a set of n pages and a cache of size k < n. The online input comprises a29

sequence of requests for these pages. If the requested page is already in the cache, then the30

algorithm does not need to do anything. But, if the requested page is not in the cache, then31

the algorithm suffers what is known as a cache miss and must bring the requested page into32

the cache. If the cache is full, then an existing page must be evicted from the cache to make33

room for the new page. The goal of the online algorithm is to minimize the total number of34

cache misses in the unweighted paging problem, and the total weight of the evicted pages in35

the weighted paging problem. It is well-known that for both problems, the best deterministic36

algorithms have a competitive ratio of O(k) and the best randomized algorithms have a37

competitive ratio of O(log k) (see, e.g., [4, 2]).38

Although the paging problem is essentially solved from the perspective of competitive39

analysis, it also highlights the limitations of this framework. For instance, it fails to distinguish40

between algorithms that perform nearly optimally in practice such as the least recently used41

(LRU) rule and very naïve strategies such as flush when full that evicts all pages whenever the42

cache is full. In practice, paging algorithms are augmented with predictions about the future43

(such as those generated by machine learning models) to improve their empirical performance.44

To model this, for unweighted paging, several lookahead models have been proposed where45

only a partial prediction of the future leads to algorithms that are significantly better than46

what can be obtained in traditional competitive analysis. But, to the best of our knowledge,47

no such results were previously known for the weighted paging problem. In this paper, we48

initiate the study of the weighted paging problem with future predictions.49

For unweighted paging, it is well-known that evicting the page whose next request is50

farthest in the future (also called Belady’s rule) is optimal. As a consequence, it suffices51

for an online algorithm to simply predict the next request of every page (we call this per52

request prediction or PRP in short) in order to match offline performance. In fact, Lykouris53

and Vassilvitskii [9] (see also Rohatgi [13]) showed recently that in this prediction model,54

one can simultaneously achieve a competitive ratio of O(1) if the predictions are accurate,55

and O(log k) regardless of the quality of the predictions. Earlier, Albers [1] used a different56

prediction model called `-strong lookahead, where we predict a sequence of future requests57

that includes ` distinct pages (excluding the current request). For ` = n− 1, this prediction58

is stronger than the PRP model, since the algorithm can possibly see multiple requests for a59

page in the lookahead sequence. But, for ` < n− 1, which is typically the setting that this60

model is studied in, the two models are incomparable. The main result in [1] is to show that61

one can obtain a constant approximation for unweighted paging for ` ≥ k − 2.62

Somewhat surprisingly, we show that neither of these models are sufficient for weighted63

paging. In particular, we show a lower bound of Ω(k) for deterministic algorithms and64

Ω(log k) for randomized algorithms in the PRP model. These lower bounds match, up to65

constants, standard lower bounds for the online paging problem (without prediction) (see,66

e.g., [11]), hence establishing that the PRP model does not give any advantage to the online67

algorithm beyond the strict online setting. Next, we show that for `-strong lookahead, even68

with ` = k, there are lower bounds of Ω(k) for deterministic algorithms and Ω(log k) for69

randomized algorithms, again asymptotically matching the lower bounds from online paging70

without prediction. Interestingly, however, we show that a combination of these prediction71

models is sufficient: if ` = n − 1 in the strong lookahead setting, then we get predictions72

that subsume both models; and, in this case, we give a simple deterministic algorithm with a73

Z. Jiang, D. Panigrahi, K. Sun 69:3

competitive ratio of 2 for weighted paging, thereby overcoming the online lower bounds.74

Obtaining online algorithms with predictions, however, is fraught with the risk that the75

predictions are inaccurate which renders the analysis of the algorithms useless. Ideally, one76

would therefore, want the algorithms to also be robust, in that their performance gracefully77

degrades with increasing prediction error. Recently, there has been significant interest in78

designing online algorithms with predictions that achieve both these goals, of matching79

nearly offline performance if the predictions are correct, and of gracefully degrading as the80

prediction error increases. Originally proposed for the (unweighted) paging problem [9], this81

model has gained significant traction in the last couple of years and has been applied to82

problems in data structures [10], online decision making [12, 6], scheduling theory [12, 8],83

frequency estimation [7], etc. Our final result contributes to this line of research.84

First, if the online algorithm and offline optimal solution both use a cache of size k, then85

we show that no algorithm can asymptotically benefit from the predictions while achieving86

sublinear dependence on the prediction error. Moreover, if we make the relatively modest87

assumption that the algorithm is allowed a cache that contains just 1 extra slot than that of88

the optimal solution, then we can achieve constant competitive ratio when the prediction89

error is small.90

1.1 Overview of models and our results91

Our first result is a lower bound for weighted paging in the PRP model. Recall that in the92

PRP model, in addition to the current page request, the online algorithm is provided the93

time-step for the next request of the same page. For instance, if the request sequence is94

(a, b, a, c, d, b, . . .), then at time-step 1, the algorithm sees request a and is given position 3,95

and at time-step 2, the algorithm sees request b and is given position 6.96

I Theorem 1. For weighted paging with PRP, any deterministic algorithm is Ω(k)-competitive,97

and any randomized algorithm is Ω(log k)-competitive.98

Note that these bounds are tight, because there exist online algorithms without prediction99

whose competitive ratios match these bounds (see Chrobak et al. [4] and Bansal et al. [2]).100

Next, for the `-strong lookahead model, we show lower bounds for weighted paging. Recall101

that in this model, the algorithm is provided a lookahead into future requests that includes102

` distinct pages. For instance, if ` = 3 and the request sequence is (a, b, a, c, d, b, . . .), then103

at time-step 1, the algorithm sees request a and is given the lookahead sequence (b, a, c)104

since it includes 3 distinct pages. At time step 2, the algorithm sees request b and is given105

(a, c, d). Note the difference with the PRP model, which would not be give the information106

that the request in time-step 5 is for page d, but does give the information that the request107

in time-step 6 is for page b.108

I Theorem 2. For weighted paging with `-strong lookahead where ` ≤ n−k, any deterministic109

algorithm is Ω(k)-competitive, and any randomized algorithm is Ω(log k)-competitive.110

For weighted paging with `-strong lookahead where n−k+1 ≤ ` ≤ n−1, any deterministic111

algorithm is Ω(n− `)-competitive, and any randomized algorithm is Ω(log(n− `))-competitive.112

In contrast to these lower bounds, we show that a prediction model that combines features113

of these individual models gives significant benefits to an online algorithm. In particular,114

combining PRP and `-strong lookahead, we define the following prediction model:115

116

SPRP (“strong per-request prediction”): On a request for page p, the predictorgives the next time-step when p will be requested and all page requests till that request.117

ICALP 2020


This is similar to (n − 1)-strong lookahead, but is slightly weaker in that it does not118

provide the first request of every page at the outset. After each of the n pages has been119

requested, SPRP and (n− 1)-strong lookahead are equivalent.120

I Theorem 3. There is a deterministic 2-competitive for weighted paging with SPRP.121

So far, all of these results assume that the prediction model is completely correct.122

However, in general, predictions can have errors, and therefore, it is desirable that an123

algorithm gracefully degrades with increase in prediction error. To this end, we also give124

upper and lower bounds in terms of the prediction error.125

For unweighted paging, Lykouris and Vassilvitski [9] basically considered two measures126

of prediction error. The first, called `pd in this paper, is defined as follows: For each input127

request pt, we increase `pd by w(pt) times the absolute difference between the predicted128

next-arrival time and the actual next-arrival time. For unweighted paging, Lykouris and129

Vassilvitskii [9] gave an algorithm with cost O(OPT +√`pd · OPT). Unfortunately, we rule130

out an analogous result for weighted paging.131

I Theorem 4. For weighted paging with SPRP, there is no deterministic algorithm whose cost132

is o(k)·OPT+o(`pd), and there is no randomized algorithm whose cost is o(log k)·OPT+o(`pd).133

It turns out that the `pd error measure is closely related to another natural error measure134

that we call the `1 measure. This is defined as follows: for each input request pt, if the135

prediction qt is not the same as pt, then increase `1 by the sum of weights w(pt) + w(qt).136

(This is the `1 distance between the predictions and actual requests in the standard weighted137

star metric space for the weighted paging problem.) The lower bound for `pd continues to138

hold for `1 as well, and is tight.139


is o(k) ·OPT+o(`1), and there is no randomized algorithm whose cost is o(log k) ·OPT+o(`1).141

Furthermore, there is a deterministic algorithm with SPRP with cost O(OPT + `1).142

One criticism of both the `pd and `1 error measures is that they are not robust to insertions143

or deletions from the prediction stream. To counter this, Lykouris and Vassilvitski [9] used a144

variant of the classic edit distance measure, and showed a constant competitive ratio for this145

error measure. For weighted paging, we also consider a variant of edit distance, called èd and146

formally defined in Section 5, which allows insertions and deletions between the predicted147

and actual request streams.2 Unfortunately, as with `pd and `1, we rule out algorithms148

that asymptoticaly benefit from the predictions while achieving sublinear dependence on èd.149

Furthermore, if the algorithm were to use a cache with even one extra slot than the optimal150

solution, then we show that even for weighted paging, we can achieve a constant competitive151

algorithm. We summarize these results in the next theorem.152


is o(k)·OPT+o(èd), and there is no randomized algorithm whose cost is o(log k)·OPT+o(èd).154

In the same setting, there exists a randomized algorithm that uses a cache of size k+ 1 whose155

cost is O(OPT + èd), where OPT uses a cache of size k.156

2 For technical reasons, neither èd in this paper nor the edit distance variant in [9] exactly match theclassical definition of edit distance.


1.2 Related work157

We now give a brief overview of the online paging literature, highlighting the results that158

consider a prediction model for future requests. For unweighted paging, the optimal offline159

algorithm is Belady’s algorithm, which always evicts the page that appears farthest in the160

future [3]. For online paging, Sleator and Tarjan [14] gave a deterministic k-competitive161

algorithm, and Fiat et al. [5] gave a randomized O(log k)-competitive algorithm; both results162

were also shown to be optimal. For weighted online paging, Chrobak et al. [4] gave a163

deterministic k-competitive algorithm, and Bansal et al. [2] gave an O(log k)-competitive164

randomized algorithm, which are also optimal by extension.165

Recently, Lykouris and Vassilvitskii [9] introduced a prediction model that we call166

PRP in this paper: on each request p, the algorithm is given a prediction of the next167

time at which p will be requested. For unweighted paging, they gave a randomized168

algorithm, based on the “marker” algorithm of Fiat et al. [5], with competitive ratio169

O(min(√`pd/OPT, log k)). Here, `pd is the absolute difference between the predicted arrival170

and actual arrival times of requests, summed across all requests. They also perform a tighter171

analysis yielding a competitive ratio of O(min(ηed/OPT, log k)), where ηed is the edit distance172

between the predicted sequence and the actual input. Subsequently, Rohatgi [13] improved173

the former bound to O(1 + min((`pd/OPT)/k, 1) log k) and also proved a lower bound of174

Ω(log min((`pd/OPT)/(k log k), k)).175

Albers [1] studied the `-strong lookahead model: on each request p, the algorithm is176

shown the next ` distinct requests after p and all pages within this range. For unweighted177

paging, Albers [1] gave a deterministic (k − `)-competitive algorithm and a randomized178

2Hk−`-competitive algorithm. Albers also showed that these bounds are essentially tight:179

if l ≤ k − 2, then any deterministic algorithm has competitive ratio at least k − `, and any180

randomized algorithm has competitive ratio at least Ω(log(k − `)).181

Finally, we review the paging model in which the offline adversary is restricted to a182

cache of size h < k, while the online algorithm uses a larger cache of size k. For this model,183

Young [16] gave a deterministic algorithm with competitive ratio k/(k − h+ 1) and showed184

that this is optimal. In another paper, Young [15] showed that the randomized “marker”185

algorithm is O(log(k/k − h))-competitive and this bound is optimal up to constants.186

Roadmap187

In Section 2, we show the lower bounds stated in Theorem 1 for the PRP model. The lower188

bounds for the `-strong lookahead model stated in Theorem 2 are proven in Section 3. In189

Section 4, we state and analyze the algorithm for the SPRP model with no error, thereby190

proving Theorem 3. Finally, in Section 5, we consider the SPRP model with errors, and focus191

on the upper and lower bounds in Theorems 4, 5, and 6. Detailed proofs of these bounds192

appear in the full version of this paper.193

2 The Per-Request Prediction Model (PRP)194

In this section, we give the lower bounds stated in Theorem 1 for the PRP model. Our195

strategy, at a high level, will be the same in both the deterministic and randomized cases: we196

consider the special case where the cache size is exactly one less than the number of distinct197

pages. We then provide an algorithm that generates a specific input. In the deterministic198

case, this input will be adversarial, based on the single page not being in the cache at any199

time. In the randomized case, the input will be oblivious to the choices made by the paging200

ICALP 2020


algorithm but will be drawn from a distribution. We will give a brief overview of the main201

ideas that are common to both lower bound constructions first, and then give the details of202

the randomized construction in this section. The details of the deterministic construction203

are deferred to the full paper.204

Let us first recall the Ω(k) deterministic lower bound for unweighted caching without205

predictions. Suppose the cache has size k and the set of distinct pages is a0, a1, . . . , ak. At206

each step, the adversary requests the page a` not contained in the cache of the algorithm207

ALG. Then ALG incurs a miss at every step, while OPT, upon a miss, evicts the page whose208

next request is furthest in the future. Therefore, ALG misses at least k more times before209

OPT misses again.210

Ideally, we would like to imitate this construction. But, the adversary cannot simply211

request the missing page a` because that could violate the predictions made on previous212

requests. Our first idea is to replace this single request for a` with a “block” of requests213

of pages containing a` in a manner that all the previous predictions are met, but ALG still214

incurs the cost of page a` in serving this block of requests.215

But, how do we guarantee that OPT only misses requests once for every k blocks? Indeed,216

it is not possible to provide such a guarantee. Instead, as a surrogate for OPT, we use an217

array of k algorithms ALGi for 1 ≤ i ≤ k, where each ALGi follows a fixed strategy: maintain218

all pages except a0 and ai permanently in the cache, and swap a0 and ai as required to serve219

their requests. Our goal is to show that the sum of costs of all these algorithms is a lower220

bound (up to constants) on the cost of ALG; this would clearly imply an Ω(k) lower bound.221

This is where the weights of pages come handy. We set the weight w(ai) of page ai222

in the following manner: w(ai) = ci for some constant c ≥ 2. Now, imagine that a block223

requested for a missing page a` only contains pages a0, a1, . . . , a` (we call this an `-block).224

The algorithms ALGi for i ≤ ` suffer a cache miss on page ai in this block, while the remaining225

algorithms ALGi for i > ` do not suffer a cache miss in this block. Moreover, the sum of226

costs of all the algorithms ALGi for i ≤ ` in this block is at most a constant times that of227

the cost of ALG alone, because of the geometric nature of the cost function.228

The only difficulty is that by constructing blocks that do not contain pages ai for i > `,229

we might be violating the previous predictions for these pages. To overcome this, we create230

an invariant where for every i, an (i + 1)-block must be introduced after a fixed number231

of i-blocks. Because of this invariant, we are sometimes forced to introduce a larger block232

than that demanded by the missing page in ALG. To distinguish between these two types of233

blocks, we call the ones that exactly correspond to the missing page a regular block, and234

the ones that are larger irregular blocks. Irregular blocks help preserve the correctness of all235

previous predictions, but the sum of costs of ALGi’s on an irregular block can no longer be236

bounded against that of ALG. Nevertheless, we can show that the number of irregular blocks237

is small enough that this extra cost incurred by ALGi’s in irregular blocks can be charged off238

to the regular blocks, thereby proving the deterministic lower bound:239

I Theorem 7. For weighted paging with PRP, any deterministic algorithm is Ω(k)-competitive.240

A formal proof of this theorem is deferred to the full paper. Instead, we focus on proving241

the lower bound for randomized algorithms.242

2.1 Randomized Lower Bound243

This subsection is devoted to proving the following theorem:244

I Theorem 8. For weighted paging with PRP, any randomized algorithm is Ω(log k)-245

competitive.246


Here, we still use the same idea of request blocks, but now the input is derived from a fixed247

distribution and is not aware of the state of ALG. The main idea is to design a distribution248

over block sizes in a manner that still causes any fixed deterministic algorithm ALG to suffer249

a large cost in expectation, and then invoke Yao’s minimax principle to translate this to a250

randomized lower bound.251

Let Hk = 1 + 1/2 + · · · + 1/k ≈ ln k denote the k-th harmonic number. The input is252

defined as follows:253

1. For 0 ≤ i ≤ k, set ui = (2ckHk + 2)i and let yi = 0 for i < k.254

2. Repeat the following:255

a. Select a value of ` according to the following probability distribution: Pr[` = j] = c−1cj+1256

for j ∈ 0, 1, . . . , k − 1 and Pr[` = k] = 1ck .257

b. Increase ` until ` = k or y` < 2ckHk.258

c. For j from 0 to `,259

i. Set all requests from time t + 1 through uj − 1 as aj−1. (Note: If j = 0, then260

uj = t+ 1, so this step is empty.)261

ii. Set the request at time uj as aj .262

iii. Let t = uj .263

d. For 0 ≤ j ≤ `, let uj = t+ (2ckHk + 2)j .264

e. For 0 ≤ j < `, let yj = 0. If ` < k, increase y` by one.265

Note that if ` is not increased in Step 2b, then this block is regular ; otherwise, it is266

irregular. Let vi denote the number of regular i-blocks, and let v′i denote the number of267

irregular i-blocks. A j-block is an i-plus block if and only if j ≥ i. We first lower bound the268

cost of ALG by the number of blocks.269

I Lemma 9. Every requested block increases E [cost(ALG)] by at least a constant.270

Proof. At every time step, the cache of ALG is missing some page aj . The probability that271

aj is requested in the next block is at least Pr[` = j] ≥ 12cj , so the expected cost of serving272

this block is at least cj · Pr[` = j] = Ω(1). J273

For the rest of the proof, we upper bound the cost of OPT. We first upper bound the274

number of regular blocks, and then we use this to bound the number of irregular blocks.275

I Lemma 10. For every i ∈ 0, 1, . . . , k, we have E [vi] ≤ 2c−im.276

Proof. Consider the potential function φ(y) =∑k−1

i=0 yi ≥ 0. The initial value of φ(y) is 0.277

Notice that whenever a regular block is generated, φ(y) increases by at most 1, and whenever278

an irregular block is generated, φ(y) decreases by at least 2ckHk. Thus, the number of279

irregular blocks is at most the number of regular blocks, so the total number of blocks is at280

most 2m. The lemma follows by noting that the probability that a block is a regular i-block281

is at most c−i. J282

I Lemma 11. For every i ∈ 0, 1, . . . , k, we have E [v′i] ≤ 2mcikHk

.283

Proof. Observe that v′i ≤ 12ckHk

(v′i−1 + vi−1) and v′1 ≤ 12ckHk

v0. Repeatedly applying this284

inequality yields285

E [v′i] ≤i−1∑j=0

E [vj ](2ckHk)i−j

≤i−1∑j=0

2c−jm

(2ckHk)i−j= 2m

ci

i−1∑j=0

1(2kHk)i−j

≤ 2mcikHk

,286

where the second inequality holds due to Lemma 10. J287

ICALP 2020


Now let A denote the entire sequence of requests, B the subsequence of A comprising all288

regular blocks, and m the number of blocks in B. We bound OPT = OPT(A) in terms of289

the optimal cost on B and the number of irregular blocks.290

I Lemma 12. Let OPT(A) and OPT(B) denote the optimal offline algorithm on request291

sequences A and B respectively. Then cost(OPT(A)) ≤ cost(OPT(B)) + 4c∑k

i=0 v′ic

i.292

Proof. Consider the following algorithm ALGA on request sequence A:293

1. For requests in regular blocks, imitate OPT(B). That is, copy the cache contents when294

OPT(B) serves this block.295

2. Upon the arrival of an irregular i-block, let a` denote the page not in the cache.296

a. If ` > i, then the cost of serving this block is 0.297

b. If 1 ≤ ` ≤ i, evict a0 when a` is requested. Then evict a` and fetch a0 at the end of298

this block; the cost of this is 2(ci + 1).299

c. If ` = 0, we evict a1 and fetch a0 when a0 is requested. Then we evict a0 and fetch a1300

when a1 is requested or at the end of this block (if a1 is not requested in this block).301

The cost is 2(c+ 1).302

For each irregular block, notice that the cache of ALGA is the same at the beginning303

and the end of the block. So Step 2 does not influence the imitation in Step 1. The cost of304

serving an irregular i-block is at most 4ci+1. Combining these facts proves the lemma. J305

To bound OPT(B), we divide the sequence B into phases. Each phase is a contiguous306

sequence of blocks. Phases are defined recursively, starting with 0-phases all the way through307

to k-phases. A 0-phase is defined as a single request. For i ≥ 1, let Mi denote the first time308

that an i-plus-block is requested and let Qi denote the first time that c (i− 1)-phases have309

appeared. An i-phase ends immediately after Mi and Qi have both occurred. In other words,310

an i-phase is a minimal contiguous subsequence that contains c (i− 1)-phases and an i-plus311

block. (Notice that for a fixed i, the set of i-phases partition the input sequence.)312

For any k-phase, we upper bound OPT by considering an algorithm ALGkB that is optimal313

for B subject to the additional restriction that a0 is not in the cache at the beginning or end314

of any k-phase. We bound the cost of ALGkB in any k-phase using a more general lemma.315

I Lemma 13. For any i, let ALGiB be an optimal algorithm on B subject to the following: a0316

is not in the cache at the beginning or the end of any i-phase. Then the cost of ALGiB within317

an i-phase is at most 4ci+1. In particular, in each k-phase, the algorithm ALGkB incurs cost318

at most 4ck+1.319

Proof. We shall prove this by induction on i. If i = 0, then the phase under consideration is320

one step. To serve one step, we can evict a1 to serve a0, and then evict a0 if necessary for a321

total cost of 4c. Now assume that the lemma holds for all values in 0, . . . , i − 1. Let si322

denote the first i-plus block; there are two possible cases for the structure of an i-phase:323

1. si appears after the c (i− 1)-phases: In this case, the i-phase ends after this block. Thus,324

one strategy to serve the phase is to evict ai at the beginning and evict a0 when ai is325

requested within si. These two evictions cost at most 4ci+1.326

2. si appears within the first c (i− 1)-phases: By the inductive hypothesis, the algorithm327

can serve these c (i− 1)-phases with total cost at most c · 4ci = 4ci+1. J328

Finally, we lower bound the expected number of blocks in an i-phase. Since the total329

number of blocks is fixed, this allows us to upper bound the number of k-phases in the entire330

sequence. The next proposition forms the technical core of the lower bound:331


I Proposition 14. For i ≥ 1, the expected number of blocks in an i-phase is at least ciHi/4.332

We defer the proof of Proposition 14 to the end of this section; first, we use it to prove333

Theorem 8.334

Proof of Theorem 8. Let OPT(A) denote the cost of an optimal algorithm on the request335

sequence A, and let OPT(B) denote the cost of an optimal algorithm on the regular blocks336

B. Then we have the following:337

E [cost(OPT(A))] ≤ E [cost(OPT(B))] + 4ck∑

i=0ci · E [v′i] (Lemma 12)338

≤ E[cost(ALGk

B)]

+ 4ck∑

i=0ci · 2m

cikHk(Lemma 11)339

≤ 4ck+1 · E [Nk(B)] + 16cmHk

, (Lemma 13)340341

where Nk(B) denotes the number of k-phases in B. According to Proposition 14, the342

expected number of blocks in a k-phase is at least ckHk/4, which implies E [Nk(B)] ≤ 4mckHk

.343

Combining this with the above, we get344

E [cost(OPT(A))] ≤ 16cmHk

+ 16cmHk

= O

(m

Hk

).345

Since any algorithm incurs at least some constant cost in every block by Lemma 9, its cost is346

Ω(m), which concludes the proof. J347

Proof of Proposition 14348

Let zi be a random variable denoting the number of i-plus blocks in a fixed i-phase. We will349

first prove a sequence of three lemmas to yield a lower bound on E [zi].350

I Lemma 15. For any i ≥ 1, we have E [zi] = E [zi−1] + PrMi > Qi.351

Proof. Recall that an i-phase ends once it contains c (i− 1)-phases and an i-plus block. In352

each of the (i− 1)-phases, the expected number of (i− 1)-plus blocks is E [zi−1], so the total353

expected number of (i− 1)-plus blocks in the first c (i− 1)-phases of an i-phase is c ·E [zi−1].354

An elementary calculation shows that an (i − 1)-plus block is an i-plus block with355

probability 1/c. Thus, in expectation, the first c (i−1)-phases of this i-phase contain E [zi−1]356

i-plus blocks.357

If there are no i-plus blocks in the first c (i− 1)-phases, then the i-phase ends as soon358

as an i-plus block appears. In this case, we have zi = 1, and this happens with probability359

exactly PrMi > Qi. Otherwise, the i-phase ends immediately after the c (i− 1)-phases, in360

which case no additional term is added. J361

I Lemma 16. For any i ≥ 1, we have PrMi > Qi ≥ e−2E[zi−1].362

Proof. We let v1, . . . , vc denote the number of i-plus blocks in the first c (i− 1)-phases and363

let V =∑c

i=1 vi. As we saw in the proof of Lemma 15, an (i − 1)-plus block is an i-plus364

block with probability 1/c, so the probability that an (i− 1)-plus block is an (i− 1)-block is365

1− 1/c. Thus, we have366

PrMi > Qi = Ev1,v2,...,vc

[(1− 1

c

)V]≥(

1− 1c

)E[V ]=(

1− 1c

)c·E[zi−1]367

ICALP 2020


where the inequality follows from convexity and the second equality holds due to linearity of368

expectation. The lemma follows from this and the fact that c ≥ 2. J369

I Lemma 17. For any i ≥ 0, we have E [zi] ≥ 14Hi.370

Proof. When i ≤ 4, we have E [zi] ≥ 1 ≥ 14Hi. Now for induction, assume the statement371

holds for j < i, and consider the two possible cases:372

1. If E [zi−1] ≥ 12Hi−1, then Lemma 15 implies E [zi] ≥ E [zi−1] ≥ 1

4Hi.373

2. If E [zi−1] < 12Hi−1 <

12 (1 + ln(i− 1)), then374

E [zi] = E [zi−1] + PrMi > Qi ≥ 14Hi−1 + e−2·E[zi−1], where the equality follows from375

Lemma 15 and the inequality holds by the induction hypothesis and Lemma 16. Thus,376

E [zi] ≥ 14Hi−1 + 1

e ·1

i−1 ≥14Hi. J377

Now let Li denote the number of blocks in an i-phase; recall that our goal is to lower378

bound its expectation by ciHi/4. The following lemma relates Li to zi.379

I Lemma 18. For any i ≥ 0, we have E [Li] = ci · E [zi].380

Proof. When i = 0, the lemma holds because E[L0] = E[z0] = 1, so now we assume i ≥ 1.381

Recall that an i-phase contains at least c (i − 1)-phases, so the expected total number of382

blocks in the first c (i− 1)-phases of this i-phase is c · E [Li−1].383

If there are no i-plus-blocks in these c (i−1)-phases, we need to wait for an i-plus block to384

appear in order for the i-phase to end. This is a geometric random variable with expectation385

ci. Thus, we have: E [Li] = c · E [Li−1] + ci · PrMi > Qi. Applying this recursively,386

E [Li] = ci

i∑j=1

PrMj > Qj+ E [L0]

= ci

i∑j=1

PrMj > Qj+ 1

387

Furthermore, from Lemma 15, we have388

E [zi] = E [zi−1] + PrMi > Qi = E [z0] +i∑

j=1PrMj > Qj = 1 +

i∑j=1

PrMj > Qj.389

Combining the two equalities yields the lemma. J390

We conclude by proving Proposition 14. Fix some i ≥ 1. Using Lemma 18 and Lemma 17,391

we get E [Li] = ci · E [zi] ≥ ciHi

4 .392

3 The `-Strong Lookahead Model393

Now we consider the following prediction model: at each time t, the algorithm can see request394

pt as well as L(t), which is the set of all requests through the `-th distinct request. In other395

words, the algorithm can always see the next contiguous subsequence of ` distinct pages396

(excluding pt) for a fixed value of `. This model was introduced by Albers [1], who (among397

other things) proved the following lower bounds on algorithms with `-strong lookahead.398

I Lemma 19 ([1]). For unweighted paging with `-strong lookahead where ` ≤ k − 2, any399

deterministic algorithm is Ω(k − `)-competitive. For randomized algorithms, the bound is400

Ω(log(k − `)).401


Notice that Lemma 19 implies that for small values of `, `-strong lookahead provides402

no asymptotic improvement to the competitive ratio of any algorithm. The proof proceeds403

by constructing a particular sequence of requests and analyzing the performance of any404

algorithm on this sequence. By slightly modifying the sequence, we can prove a similar result405

for the weighted paging problem.406

I Theorem 20. For weighted paging with `-strong lookahead where n − k + 1 ≤ ` ≤407

n− 1, any deterministic algorithm is Ω(n− `)-competitive, and any randomized algorithm is408

Ω(log(n− `))-competitive.409

Proof. We modify the adversarial input in Lemma 19 as follows: insert n− k − 1 distinct410

pages with very low weight between every two pages. This causes the lookahead to have411

effective size `′ = `− (n− k − 1), because at any point L(t) contains at most `′ pages with412

normal weight. Note that if ` ≤ n− k, then `′ ≤ 1, and from Lemma 19, a lookahead of size413

1 provides no asymptotic benefit to any algorithm.414

If ` ≤ n − 3, then `′ ≤ k − 2. Thus, we can apply Lemma 19 to conclude that for any415

deterministic algorithm, the competitive ratio is Ω(k − `′) = Ω(n − ` − 1), and for any416

randomized algorithm, the competitive ratio is Ω(log(n− `− 1)). Otherwise, if ` ≥ n− 2,417

then the lower bounds continue to hold because when ` = n− 3, they are Ω(1). J418

4 The Strong Per-Request Prediction Model (SPRP)419

In this section, we define a simple algorithm called Static that is 2-competitive when the420

SPRP predictions are always correct. At any time step t, let L(t) denote the set of pages421

in the current prediction. The Static algorithm runs on “batches” of requests. The first422

batch starts at t = 1 and comprises all requests in L(1). The next batch starts once the first423

batch ends, i.e. at |L(1)|+ 1, and comprises all predicted requests at that time, and so on.424

Within each batch, the Static algorithm runs the optimal offline strategy, computed at the425

beginning of the batch on the entire set of requests in the batch.426

I Theorem 21. The Static algorithm is 2-competitive when the predictions from SPRP427

are entirely correct.428

Proof. In this proof, we assume w.l.o.g. that evicting page p costs w(p), and fetches can be429

performed for free. We partition the input into contiguous phases (which do not necessarily430

correspond to the batches of the algorithm) as follows: the first phase is simply the first431

request. Now for any i ≥ 2, phase i is defined as the minimal subsequence of contiguous432

requests that contains all pages requested in phase i − 1, starting with the first request433

arriving after phase i− 1. In other words, if P (i) denotes the set of pages that appear in434

phase i, then we require the P (i) to be the minimal subsequence of contiguous requests that435

satisfy p1 = P (1) ⊆ P (2) ⊆ · · · ⊆ P (m− 1), where m denotes the total number of phases436

and p1 is the first requested page. Note that we may not necessarily have P (m− 1) ⊆ P (m)437

because of termination of the overall sequence.438

Let OPT denote a fixed optimal offline algorithm for the entire sequence, and let OPTi439

denote the cost of OPT incurred in phase i. Similarly, let S denote the total cost of Static,440

and let Si denote the cost that Static incurs in phase i. So we have OPT =∑m

i=1 OPTi441

and S =∑m

i=1 Si.442

Now fix a phase index j ∈ 2, 3, . . . ,m and let R(j) denote the sequence of requests in443

this phase. Furthermore, let C(OPTj−1) and C(Sj−1) denote the cache states of OPT and444

Static immediately before phase j. We know that Static runs an optimal offline algorithm445

ICALP 2020


on R(j). One feasible solution is to immediately change the cache state to C(OPTj−1), and446

then imitate what OPT does in phase j. Since we charge for evictions, we have447

Sj ≤ OPTj +∑

p∈C(Sj−1)\C(OPTj−1)

w(p), for every j ∈ 2, 3, . . . ,m.448

Consider some p ∈ C(Sj−1) \ C(OPTj−1): since p ∈ C(Sj−1), we know p must appear in449

P (j − 1) because Static does not fetch pages that have never been requested. Furthermore,450

since p 6∈ C(OPTj−1), then at some point in phase j − 1, OPT must have evicted p because451

it appeared in P (j − 1) but is not in C(OPTj−1). Thus, Sj ≤ OPTj + OPTj−1. Summing452

over all j ≥ 2 and S1 ≤ OPT1 proves the theorem. J453

5 The SPRP Model with Prediction Errors454

In this section, we consider the SPRP prediction model with the possibility of prediction455

errors. We first define three measurements of error and then prove lower and upper bounds456

on algorithms with imperfect SPRP, in terms of these error measurements.457

Let A denote a prediction sequence of length m, and let B denote an input sequence of458

length n. For any time t, let At and Bt denote the t-th element of A and B, respectively.459

We also define the following for any time step t:460

prev(t): The largest i < t such that Bi = Bt (or 0 if no such if no such i exists).461

next(t): The smallest i > t such that Bi = Bt (or n+ 1 if no such i exists).462

pnext(t): The smallest i > t such that Ai = Bt (or m+ 1 if no such i exists).463

We say two requests Ai = Bj = p can be matched only if pnext(prev(j)) = i. In other464

words, Ai must be the earliest occurrence of p in A after the time of the last p in B before465

Bj . Furthermore, no edges in a matching are allowed to cross.466

First, we define a variant of edit distance between the two sequences.467

I Definition 22. The edit distance `ed between A and B is the total minimum weight of468

unmatched elements of A and B.469

Next, we define an error measure based on the metric 1-norm distance between corresponding470

requests on the standard weighted star metric denoting the weighted paging problem.471

I Definition 23. The 1-norm distance `1 between A and B is defined as follows:472

`1 =n∑

i=1Ai 6=Bi

(w(Ai) + w(Bi)) . (1-norm)473

474

Third, we define an error measure inspired by the PRP model that was also used in [9].475

I Definition 24. The prediction distance `pd between A and B is defined as follows:476

`pd =n∑

i=1w(Bi) · |next(i)− pnext(i)| .477

478

5.1 Lower Bounds479

In this section, we give an overview of the lower bounds stated in Theorems 4, 5, and 6.480

We focus on the `ed (i.e., Theorem 6) error measurement; the proofs for `1 and `pd follow481

similarly. We defer some of the proofs to the full paper.482


Our high-level argument proceeds as follows: recall that in Section 2, we showed a lower483

bound of Ω(k) on the competitive ratio of deterministic PRP-based algorithms. Given an484

SPRP algorithm ALG, we design a PRP algorithm ALG′ specifically for the input generated485

by the procedure described in Section 2. (Recall that this input is a sequence of blocks,486

where a block is a string of a0’s, a1’s, and so on, ending with a single page a` for some `.)487

We show that if ALG has cost o(k) · OPT + o(`ed) (where OPT is the optimal cost of the488

SPRP instance), then ALG′ will have cost o(k) · OPT′ (where OPT′ is the optimal cost of489

the PRP instance), which contradicts our PRP lower bound of Ω(k) on this input. For the490

randomized lower bound, we use the same line of reasoning, but replace Ω(k) with Ω(log k).491

Let k′ denote the cache size of ALG′. Recall that the set of possible page requests received492

by ALG′ is A = a0, a1, . . . , ak′ where w(ai) = ci for some constant c ≥ 2. The oracle ALG,493

maintained by ALG′, has cache size k = k′ + 1. The set of possible requests received by ALG494

is A ∪ b where w(b) = 1/v for some sufficiently large value of v. (Thus, the instance for495

ALG has k + 1 distinct pages.) Our PRP algorithm ALG′ must define a prediction and an496

input sequence for ALG.497

The prediction sequence for ALG: For any strings X and Y , let X + Y denote the498

concatenation of X and Y and let λ · X denote the concatenation of λ copies of X. Let499

L = 2ck′Hk′ + 1, and consider the series of strings: S0 = 2 · a0, and Si = L · Si−1 + ai for500

i ∈ 1, . . . , k′. We fix S := M ·Sk′ , for some sufficiently large M , as the prediction sequence501

for the SPRP algorithm. (Observe that S only contains k distinct pages, and the oracle ALG502

has cache size k.)503

ALG′ and the request sequence for ALG: Our PRP algorithm ALG′ will simultaneously504

construct input for ALG while serving its own requests. Since randomized and fractional505

algorithms are equivalent up to constants (see Bansal et al. [2]), we view the SPRP algorithm506

ALG from a fractional perspective. Let qi ∈ [0, 1] denote the fraction of page ai not in the cache507

of ALG. Notice that the vector q = (q0, q1, . . . , qk′) satisfies∑k′

i=0 qi ≥ 1. (A deterministic508

algorithm is the special case where every qi ∈ 0, 1.) Similarly, let q′ = (q′0, q′1, . . . , q′k′),509

where q′i denotes the amount of request for ai that is not in the cache in ALG′.510

When a block ending with ai is requested, ALG′ scans S for the next appearance of ai.511

It then feeds the scanned portion to ALG, followed by a single request for page b. In this512

case, the prediction error only occurs due to the requests for this page b. After serving this513

request b, the cache of ALG contains at most k′ pages in A. This enables ALG′ to mimic514

the behavior of ALG upon serving the current block. This process continues for every block:515

ALG′ modifies the input by inserting an extra request b into the input for ALG, and mimics516

the resulting cache state of ALG. The details of our algorithm ALG′ are given below:517

1. Initially, let S be the input for ALG and t = 0. (We will modify S as time passes.)518

2. For all 0 ≤ i ≤ k′, let q′i = 1. (Note that the initial value of every qi is also 1.)519

3. On PRP request block si = (a0, a1, . . . , ai) (for some unknown i):520

a. Let q′ = (q′0, q′1, . . . , q′k′) denote the current cache state.521

b. Set q′ = (0,min1, q′0 + q′1, q′2, q′3, . . . , q′k′) to serve a0. Note that after we serve a0, the522

PRP prediction tells us the value of i.523

c. Find the first time t′ after t when S requests ai and set t = t′ + 2.524

d. Change the request at time t into b. (Note that the original request is a0.)525

e. Run ALG until this b is served to obtain a vector q = (q0, q1, . . . , qk′).526

f. If i ≥ 1, set q′ = (min1,∑i

j=0 q′j, 0, 0, . . . , 0, q′i+1, q

′i+2, . . . , q

′k′); this serves the527

requests (a1, a2, . . . , ai).528

g. Set q′ = (q0, q1, . . . , qk′).529

ICALP 2020


Bounding the costs. The main idea in the analysis is the following: since the input530

sequences to ALG and ALG′ are closely related, and they maintain similar cache states, we531

can show that they are coupled both in terms of the algorithm’s cost and the optimal cost.532

Therefore, the ratio of Ω(k) for ALG′ (from Theorem 7) translates to a ratio of Ω(k) for ALG.533

Furthermore, since the only prediction errors are due to the additional requests for page b,534

and this page has a very small weight, the cost of ALG is at least the value of èd. (The same535

line of reasoning is used for randomized algorithms, but Ω(k) is replaced by Ω(log k).)536

We now formalize the above line of reasoning with the following lemmas.537

I Lemma 25. Using any SPRP algorithm ALG as a black box, the PRP algorithm ALG′538

satisfies the following: cost(ALG′) ≤ 2(c+ 1) · cost(ALG).539

Proof. Note that q = q′ at the beginning and end of Step 3. For convenience, let q′ denote540

the vector at the beginning of Step 3, and let q denote the vector at the end of Step 3. Let541

costALG and costALG′ denote the cost of ALG and ALG′ respectively incurred in a fixed Step 3.542

Each time ALG′ enters Step 3, the cost incurred is at most:543

Step 3b: q′0 · (1 + c),544

Step 3f: (q′0 + q′1) · (1 + c) +i∑

j=2q′j · (1 + cj),545

Step 3g:

i∑j=1

qj · (1 + cj)

+

k∑j=i+1

∣∣q′j − qj

∣∣ · (1 + cj)

.546

547548

Summing the above yields the following:549

costALG′ ≤ 2(c+ 1) ·

i∑j=0

cj ·(qj + q′j

)+

k∑j=i+1

cj ·∣∣qj − q′j

∣∣ .550

Now we consider ALG. For each j, at the beginning of Step 3, there is q′j amount of aj551

not in the cache, and at the end of Step 3, there is qj amount of aj not in the cache.552

If j > i, the cost incurred due to aj is at least cj ·∣∣qj − q′j

∣∣. If j ≤ i, ALG′ must serve aj553

at some point in Step 3e, so the incurred cost due to aj is at least cj ·(qj + q′j

). Summing554

the above yields the following:555

costALG ≥

i∑j=0

cj ·(qj + q′j

)+

k∑j=i+1

cj ·∣∣qj − q′j

∣∣ .556

Combining the two inequalities above proves the lemma. J557

Now let OPT denote the optimal SPRP algorithm for the input sequence served by ALG,558

and let OPT′ denote the optimal PRP algorithm for the input sequence served by ALG′. We559

can similarly prove the following lemma (proof in full paper):560

I Lemma 26. The algorithms OPT and OPT′ satisfy cost(OPT) ≤ 2 · cost(OPT′).561

We are now ready to bound the cost of any algorithm with SPRP (proof in full paper):562

I Theorem 27. For weighted paging with SPRP, there is no deterministic algorithm whose563

cost is o(k) · OPT + o(èd), and there is no randomized algorithm whose cost is o(log k) ·564

OPT + o(èd).565


Proof (Sketch). From Theorem 7, we know ALG′ = Ω(k) ·OPT′. Thus, applying Lemmas 25566

and 26, we have ALG = Ω(k) · OPT. Furthermore (as we saw in Section 2), each PRP block567

increases ALG by at least a constant. At the same time, for each block, we can show that èd568

increases by at most 2. As a result, we can conclude that ALG = Ω(`1). The theorem follows569

by combining these facts. For randomized algorithms, the same line of reasoning holds, but570

with Ω(log k) instead of Ω(k). J571

5.2 Upper Bounds572

In this section, we give algorithms whose performance degrades with the value of the SPRP573

error. In particular, we first prove the upper bound in Theorem 6 for the èd measurement,574

and then analyze the Follow algorithm, which proves the upper bound in Theorem 5.575

Now we present an algorithm that uses a cache of size k + 1 whose cost scales linearly576

with OPT + èd. Following our previous terminology, let A denote a prediction sequence of577

length m, and let B denote an input sequence of length n.578

Our algorithm, which we call Learn, relies on an algorithm that we call Idle. At a high579

level, Idle resembles Static (see Section 4): it partitions the prediction sequence A into580

batches and runs an optimal offline algorithm on each batch. The Learn algorithm tracks581

the cost of imitating Idle: if the cost is sufficiently low, then it will imitate Idle on k of its582

cache slots; otherwise, it will simply evict the page in the extra cache slot.583

Before formally defining Idle, we consider a modified version of caching. Our cache584

has k + 1 slots, where one slot is memoryless: it always immediately evicts the page it just585

fetched. In other words, this slot can serve any request, but it cannot store any pages. Let586

OPT+1 denote the optimal algorithm that uses a memoryless cache slot.587

I Lemma 28. For any sequences A and B, cost(OPT+1(A)) ≤ cost(OPT(B)) + 2èd, where588

èd is the edit distance between A and B.589

Proof. Let M denote the optimal matching between A and B (for èd). One algorithm for590

OPT+1(A) is the following: imitate what OPT(B) does for requests matched by M , and use591

the memoryless slot for unmatched requests. The cost of this algorithm is OPT(B)+2èd. J592

Recall that the Static algorithm requires the use of an optimal offline algorithm. Similarly,593

for our new problem with a memoryless cache slot, we require a constant-approximation594

offline algorithm on A. This can be obtained from the following lemma (proof in full paper):595

I Lemma 29. Given a prediction sequence A, there is a randomized offline algorithm whose596

cost is at most a constant times the cost of OPT+1(A).597

The Idle algorithm598

Assume that our cache has size k + 1 and the extra slot is memoryless (as defined above).599

For any time step t, let L(t) denote the set of pages predicted to arrive starting at time600

t+ 1. At time step 1 (i.e., initially), Idle runs the offline algorithm from Lemma 29 on L(1),601

ignoring future requests. After the requests in L(1) have been served, i.e., at time |L(1)|+ 1,602

Idle then consults the predictor and runs the offline algorithm on the next “batch”. The603

algorithm proceeds in this batch-by-batch manner until the end. We can show that the604

competitive ratio of this algorithm is at most a constant (see full paper).605

I Lemma 30. On the prediction sequence A, we have cost(Idle) = O(1) · cost(OPT+1(A)).606

ICALP 2020


The Learn algorithm607

Before defining the algorithm, we introduce another measurement of error that closely608

approximates èd. Recall that A denotes a prediction sequence of length m and B denotes609

an input sequence of length n. In defining èd, two elements Ai = Bj can be matched only if610

pnext(prev(j)) = i, and no matching edges are permitted to cross.611

I Definition 31. The constrained edit distance `′ed is the minimum weight of unmatched612

elements of A and B, with the following additional constraint: if |P (Ai)| ≥ 2, then Ai can613

only be matched with the latest-arriving element in P (Ai).614

We note that `′ed is a constant approximation of èd (proof in full paper):615

I Lemma 32. For any sequences A,B, we have èd ≤ `′ed ≤ 3èd.616

Now we are ready to define the Learn algorithm. For any i ≤ j, we let A(i, j) denote617

the subsequence (Ai, Ai+1, . . . , Aj). For any set (or multiset) of pages S, we let w(S) denote618

the total cost of pages in S. The algorithm is the following:619

1. Let s = 0; the variable s always denotes that we have imitated the Idle algorithm620

through the first s requests of the prediction.621

2. Let S = ∅ be an empty queue.622

3. On the arrival of request p, add p to S.623

a. If there is a t (in [s+ 1, L] where L is the end of the current prediction) such that624

`′ed(A(s+ 1, t), S) < 13(w(A(s+ 1, t)) + w(S)), (1)625

then imitate Idle through position t, empty S and let s = t. (If more than one t626

satisfies the above, select the minimum.)627

b. Otherwise, evict the page in the final slot.628

We first observe that the algorithm is indeed feasible (proof in full paper).629

I Lemma 33. In the Learn algorithm, Step 3a is feasible, i.e., if t satisfies (1), then At = p.630

Now we arrive at the heart of the analysis: we upper bound the cost of Learn against631

the cost of Idle (i.e., a surrogate for OPT(B)) and the constrained edit distance `′ed. In632

particular, we sketch a proof of the following lemma and defer the full proof to the full paper.633

I Lemma 34. The algorithms Learn and Idle satisfy cost(Learn) ≤ cost(Idle) + 12`′ed.634

Proof (Sketch). Let cost1 denote the total cost of Step 3a, and cost2 denote the total635

cost of Step 3b so that cost(Learn) = cost1 + cost2. From the algorithm, we see that636

cost1 ≤ cost(Idle), so now we must prove cost2 ≤ 12`′ed.637

Now we establish some notation. Let `′ed((a, b)(c, d)) = `′ed(A(a, b), B(c, d)), and let638

wA(a, b) = w(A(a, b)) and wB(a, b) = w(B(a, b)).639

We proceed by induction on the number of times we went Step 3a. Consider the first640

time we enter Step 3a; suppose we have read the input B(1, b) and we now imitated Idle641

through A(1, a) for some values a, b. Since the matched edges for `′ed do not cross, there642

exists some c such that `′ed = `′ed(A,B) satisfies643

`′ed = `′ed((1, a), (1, c)) + `′ed((a+ 1,m), (c+ 1, n)).644

We consider the case where c < b; the other cases follow similarly. Let cost(x, y) denote the645

cost incurred by the algorithm when serving B(x, y) and notice that646

cost2 ≤ cost(1, c) + cost(c+ 1, b) + cost(b+ 1, n).647


The cost of serving B(1, c) is at most the weight of the requested pages, so cost(1, c) ≤ wB(1, c).648

Furthermore, we can upper bound cost(c+ 1, b) by a constant times wA(1, a) by analyzing a649

particular matching for `′ed((1, a)(1, c)). Combining this together, we have650

cost(1, c) + cost(c+ 1, b) ≤ 4(wB(1, c) + wA(1, a)) ≤ 12 · `′ed((1, a), (1, c)),651

where the second inequality follows from that we did not enter Step 3a when c arrived.652

Finally, applying the inductive hypothesis to B(b+ 1, n) and substituting the definition of c653

yields the lemma. J654

The proof of Theorem 6 follows from Lemmas 28, 30, and 34.655

The Follow algorithm656

Now we show that the Ω(`1) lower bound in Theorem 5 is tight, that is, we will give an657

SPRP algorithm Follow that has cost O(1) · (OPT + `1). Recall the Static algorithm658

from Theorem 21. The algorithm Follow ignores its input: it simply runs Static on the659

prediction sequence A and imitates its fetches/evictions on the input sequence B.660

I Theorem 35. The Follow algorithm has cost O(1) · (OPT + `1).661

Proof. Recall from Theorem 21 that cost(Static) ≤ O(1) ·OPT(A). Furthermore, we claim662

OPT(A) ≤ OPT(B) + 2`1. This is because on A, there exists an algorithm that imitates the663

movements of B: say at time t, OPT(B) evicts some element b that had appeared in B at664

time v(t). Then OPT(A) can also evict whatever element appeared at time v(t) in A, and if665

this is not b, then this cost can be charged to the v(t) term of `1. Each term of `1 is charged666

at most twice because a specific request can be evicted and fetched at most once respectively.667

By the same argument, we have cost(Follow) ≤ cost(Static) + 2`1. Combining these668

inequalities proves the theorem. J669

6 Conclusion670

In this paper, we initiated the study of weighted paging with predictions. This continues671

the recent line of work in online algorithms with predictions, particularly that of Lykouris672

and Vassilvitski [9] on unweighted paging with predictions. We showed that unlike in673

unweighted paging, neither a fixed lookahead not knowledge of the next request for every674

page is sufficient information for an algorithm to overcome existing lower bounds in weighted675

paging. However, a combination of the two, which we called the strong per request prediction676

(SPRP) model, suffices to give a constant approximation. We also explored the question of677

gracefully degrading algorithms with increasing prediction error, and gave both upper and678

lower bounds for a set of natural measures of prediction error. The reader may note that the679

SPRP model is rather optimistic and requires substantial information about the future. A680

natural question arises: can we obtain constant competitive algorithms for weighted paging681

with fewer predictions? While we refuted this for the PRP and fixed lookahead models, being682

natural choices because they suffice for unweighted paging, it is possible that an entirely683

different parameterization of predictions can also yield positive results for weighted paging.684

We leave this as an intriguing direction for future work.685

ICALP 2020


References686

1 Susanne Albers. The influence of lookahead in competitive paging algorithms. In European687

Symposium on Algorithms, pages 1–12. Springer, 1993.688

2 Nikhil Bansal, Niv Buchbinder, and Joseph Seffi Naor. A primal-dual randomized algorithm689

for weighted paging. Journal of the ACM (JACM), 59(4):19, 2012.690

3 Laszlo A. Belady. A study of replacement algorithms for a virtual-storage computer. IBM691

Systems journal, 5(2):78–101, 1966.692

4 Marek Chrobak, H Karloof, Tom Payne, and S Vishwnathan. New results on server problems.693

SIAM Journal on Discrete Mathematics, 4(2):172–181, 1991.694

5 Amos Fiat, Richard M Karp, Michael Luby, Lyle A McGeoch, Daniel D Sleator, and Neal E695

Young. Competitive paging algorithms. Journal of Algorithms, 12(4):685–699, 1991.696

6 Sreenivas Gollapudi and Debmalya Panigrahi. Online algorithms for rent-or-buy with expert697

advice. In International Conference on Machine Learning, pages 2319–2327, 2019.698

7 Chen-Yu Hsu, Piotr Indyk, Dina Katabi, and Ali Vakilian. Learning-based frequency estimation699

algorithms. In International Conference on Learning Representations, 2019.700

8 Silvio Lattanzi, Thomas Lavastida, Benjamin Moseley, and Sergei Vassilvitskii. Online701

scheduling via learned weights. In Proceedings of the 2020 ACM-SIAM Symposium on Discrete702

Algorithms, SODA 2020, Salt Lake City, UT, USA, January 5-8, 2020, pages 1859–1877, 2020.703

9 Thodoris Lykouris and Sergei Vassilvtiskii. Competitive caching with machine learned advice.704

In International Conference on Machine Learning, pages 3302–3311, 2018.705

10 Michael Mitzenmacher. A model for learned bloom filters and optimizing by sandwiching. In706

Advances in Neural Information Processing Systems, pages 464–473, 2018.707

11 Rajeev Motwani and Prabhakar Raghavan. Randomized Algorithms. Cambridge University708

Press, New York, NY, USA, 1995.709

12 Manish Purohit, Zoya Svitkina, and Ravi Kumar. Improving online algorithms via ml710

predictions. In Advances in Neural Information Processing Systems, pages 9661–9670, 2018.711

13 Dhruv Rohatgi. Near-optimal bounds for online caching with machine learned advice. In712

Shuchi Chawla, editor, Proceedings of the 2020 ACM-SIAM Symposium on Discrete Algorithms,713

SODA 2020, Salt Lake City, UT, USA, January 5-8, 2020, pages 1834–1845. SIAM, 2020.714

14 Daniel D Sleator and Robert E Tarjan. Amortized efficiency of list update and paging rules.715

Communications of the ACM, 28(2):202–208, 1985.716

15 Neal Young. On-line caching as cache size varies. In Proceedings of the Second Annual717

ACM-SIAM Symposium on Discrete Algorithms, SODA ’91, pages 241–250, 1991.718

16 Neal E Young. On-line file caching. Algorithmica, 33(3):371–383, 2002.719

online algorithms for weighted paging with predictionsdebmalya/papers/icalp20-paging.pdf · 1...

Documents