lirs : an efficient replacement policy to improve buffer cache performance song jiang 1 and xiaodong...

Post on 31-Mar-2015

217 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

LIRS : An Efficient Replacement Policy

to Improve Buffer Cache Performance

Song Jiang1 and Xiaodong Zhang1,2

1College of William and Mary2National Science Foundation

The Problem of LRU Replacement

• File scanning: one-time accessed blocks are not replaced timely;

• Loop-like accesses: blocks to be accessed soonest can be unfortunately replaced;

• Accesses with distinct frequencies: Frequently accessed blocks can be unfortunately replaced.

Inability to cope with weak access locality

Why does LRU Fail Sometimes?

• A recently used block is not necessarily to be

used again soon.

• Can not deal with working set larger than

available cache size

LRU Merits

• Simplicity: affordable implementation

• Adaptability: responsive to access pattern changes

Our Objectives

• Address the limits of LRU fundamentally.

• Retain the low overhead and adaptability merits of LRU.

Significant efforts have been made to improve LRU, but

• Case by case; or

• High runtime overhead

Our objectives:

Outline

• Related Work

• The LIRS Algorithm

• LIRS Implementation Using LRU Stack

• Performance Evaluation

• Sensitivity and Overhead Analysis

• Conclusions

Related Work

• Aided by user-level hints

• Detection and adaptation of access regularities

• Tracing and utilizing deeper history information

User-level Hints

• Application-controlled file caching [Cao et al, USENIX’94]

• Application-informed prefetching and caching [Patterson et al, SOSP’96]

Rely on users’ understanding of data access patterns

Detection and Adaptation of Regularities• SEQ: sequential access pattern detection

[Glass et al, Sigmetrics’97]

• EELRU: on-line analysis of aggregate recency distributions of referenced blocks [Smaragdakis et al, Sigmetrics’97]

• DEAR: detection of multiple block reference patterns [Choi et al, USENIX’99]

• AFC: Application/File-level Characterization [Choi et al, Sigmetrics’00]

• UBM: Unified Buffer Management [Kim et al, OSDI’00]

Case-by-case oriented approaches

Tracing and Utilizing Access History

• LRFU: combine LRU and LFU [Lee et al, Sigmetrics’99]

• LRU-K: replacement decision based on the time of the Kth-to-last reference [ O'Neil et al, Sigmod’93]

• 2Q: use two queues to quickly remove cold blocks [Johnson et al, VLDB’94]

Either high implementation cost, or

workload dependent performance

Outline

• Related Work

• The LIRS Algorithm

• LIRS Implementation Using LRU Stack

• Performance Evaluation

• Sensitivity and Overhead Analysis

• Conclusions

Observation of Data Flow in LRU Stack

• Blocks are ordered by recency in the LRU stack;

• Blocks enter from stack top, and leave from its bottom;

A block evicted from the bottom of the stack should have been evicted much earlier !

1

6

32

5

LRU stack

.

.

.

Inter-Reference Recency (IRR)

IRR of a block: number of other unique blocks accessed between two consecutive references to the block.

Recency: number of other unique blocks accessed from last reference to the current time.

1 2 3 4 3 1 5 6 5

IRR = 3

R = 2

Principles of Our Replacement

If a block’s IRR is high, its next IRR is likely to be high again. We select the blocks with high IRRs for replacement .

Once IRR is out of date, we rely on the recency.

LIRS: Low Inter-reference Recency Set Replacement Policy We keep the blocks with low IRRs in cache.

Basic LIRS Idea: Keep LIR Blocks in Cache Low IRR (LIR) block and High IRR (HIR) block

LIR block set

(size is Llirs )

HIR block set

Cache size

L = Llirs + LhirsLhirs

Llir

s

Physical CacheBlock Sets

An Example for LIRS

Llirs=2, Lhirs=1

V time /Blocks

1 2 3 4 5 6 7 8 9 10 R IRR

A X X X 1 1

B X X 3 1

C X 4 inf

D X X 2 3

E X 0 inf

LIR block set = {A, B}, HIR block set = {C, D, E}

C

D

E

HIR block set

A

B

A

B

E

LIR block set

Resident blocks

Mapping to Cache Block Sets

Lhirs=1

Llirs=2

Physical Cache

D is referenced at time 10

V time /Blocks

1 2 3 4 5 6 7 8 9 10 R IRR

A X X X 1 1

B X X 3 1

C X 4 inf

D X X XX 0 3

E X 1 Inf

The resident HIR block (E) is replaced !

Which Block is replaced ? Replace a HIR Block

V time /Blocks

1 2 3 4 5 6 7 8 9 10 R IRR

A X X X 2 1

B X X 3 1

C X 4 inf

D X X XX 0 2

E X 1 Inf

How LIR Set is Updated ? Recency of LIR Block Used

V time /Blocks

1 2 3 4 5 6 7 8 9 10 R IRR

A X X X 2 1

B X X 3 1

C X 4 inf

D X X XX 0 2

E X 1 Inf

After D is Referenced at Time 10

E is replaced, D enters LIR set

V time /Blocks

1 2 3 4 5 6 7 8 9 10 R IRR

A X X X 2 1

B X X 4 1

C X XX 0 4

D X X 3 3

E X 1 Inf

If Reference is to C at Time 10 . . . . . .

E is replaced, C can not enter LIR set

The Power of LIRS Replacement

• File scanning: one-time accessed blocks will be replaced timely;

• Loop-like accesses: blocks to be accessed soonest will NOT be replaced;

• Accesses with distinct frequencies: Frequently accessed blocks will NOT be replaced.

Capability to cope with weak access locality

Outline

• Related Work

• The LIRS Algorithm

• LIRS Implementation Using LRU Stack

• Performance Evaluation

• Sensitivity and Overhead Analysis

• Conclusions

LIRS Efficiency: O(1)

Rmax

(Maximum Recency of LIR blocks)

IRR HIR

(New IRR of the

HIR block)

This efficiency is achieved by our LIRS stack.

LRU stack + LIR block with Rmax recency in its bottom ==> LIRS stack.

Differences between LRU and LIRS Stacks

resident block

LIR block

HIR block

Cache size

L = 5

3216

5

LRU stack

53216948

LIRS stack

Llir = 3

Lhir =2

• Stack size of LRU decided by cache size, and fixed; Stack size of LIRS decided by LIR block with Rmax recncy, and varied.• LRU stack holds only resident blocks; LIRS stack holds any blocks whose recencies are no more than Rmax.

• LRU stack does not distinguish “hot” and “cold” blocks in it; LIRS stack distinguishes LIR and HIR blocks in it, and dynamically maintains their statues.

Rmax (Maximum Recency of LIR blocks)

IRR HIR

(New IRR of the HIR block)

Blocks in the LIRS stack ==> IRR < Rmax

Other blocks ==> IRR > Rmax

LIRS Stack

How does LIRS Stack Help?

LIRS Operations

resident in cache

LIR block

HIR block

Cache size

L = 5Llir =

3

Lhir =2

53216948

LIRS stack S

53

Resident HIR Stack Q

• Initialization: All the referenced blocks are given an LIR status until LIR block set is full.

We place resident HIR blocks in Stack Q

• Upon accessing a LIR block (a hit)

• Upon accessing a resident HIR block (a hit)

• Upon accessing a non-resident HIR block (a miss)

Access a LIR block (a Hit)

53216948

S

53

Q

532169

4

8

S

53

Q

Access 4 Access 8

resident in cache

LIR block

HIR block

Cache size

L = 5Llir =

3

Lhir =2

5321

48

S

53

Q

69

S

d

Access a HIR Resident block (a Hit)

5321

48

S

53

Q

Access 3 Access 5

1

348

S

5

Q

5

resident in cache

LIR block

HIR block

Cache size

L = 5Llir =

3

Lhir =2

3

1

48

S

5

Q

52

S

d

Access a Non-Resident HIR block (a Miss)

Access 7

5

348

S

7

Q

7

5

1

348

S

5

Q

5resident in cache

LIR block

HIR block

Cache size

L = 5Llir =

3

Lhir =2

Access a HIR Non-Resident block (a Miss) (Cont)

resident in cache

5 block number LIR block

HIR block

Cache size

L = 5Llir =

3

Lhir =2

Access 9

5

348

S

7

Q

7

5

7

348

S

9

Q

9

75

Access 5

4

S Q

8

9

87

5

3

Outline

• Related Work

• The LIRS Algorithm

• LIRS Implementation Using LRU Stack

• Performance Evaluation

• Sensitivity and Overhead Analysis

• Conclusions

Workload Traces

•cpp is a GNU C compiler pre-processor trace

• cs is an interactive C source program examination tool trace.

• glimpse is a text information retrieval utility trace.

• postgres is a trace of join queries among four relations in a relational

database system

• sprite is from the Sprite network file system

• mulit1 is obtained by executing two workloads, cs and cpp, together.

• multi2 is obtained by executing three workloads, cs, cpp, and

postgres, together.

Representative Access patterns

• Looping references: all blocks are accessed repeatedly

with a regular interval;

• Temporally-clustered references: blocks accessed more

recently are the ones more likely to be accessed again soon.

• Probabilistic references: each block has a stationary

reference probability, and all blocks are accessed

independently with the associated probabilities.

Cache Partition

• 1% of the cache size is for HIR blocks

• 99% of the cache size is for LIR blocks

• Performance is not sensitive to a partition.

Looping Pattern: cs (Time-space map)

Looping Pattern: cs (Hit Rates)

Looping Pattern: postgres (Time-space map)

Looping Pattern: postgres (Hit Rates)

Looping Pattern: postgres (Hit Rates)

Probabilistic Pattern: cpp (Time-space map)

Probabilistic Pattern: cpp (Hit Rates)

Temporally-Clustered Pattern: sprite (Time-space map)

Temporally-Clustered Pattern: sprite (Hit Rates)

Mixed Pattern: multi1 (Time-space map)

Mixed Pattern: multi1 (Hit Rates)

Mixed Pattern: multi2 (Time-space map)

Mixed Pattern: multi2 (Hit Rates)

Outline

• Related Work

• The LIRS Algorithm

• LIRS Implementation Using LRU Stack

• Performance Evaluation

• Sensitivity and Overhead Analysis

• Conclusions

Sensitivity to the Change of Lhirs

Sensitivity to the Change of Lhirs

LIRS with Limited Stack Sizes

LIRS with Limited Stack Sizes

Conclusions

• Effectively use deeper access history without explicit

regularity detection and high cost operations.

• Outperform exiting replacement policies.

• Its implementation as simple as LRU.

• Applicable to virtual memory and database buffer

management.

top related