cache replacement policy using map-based adaptive insertion

22
Cache Replacement Policy Using Map-based Adaptive Insertion Yasuo Ishii 1,2 , Mary Inaba 1 , and Kei Hiraki 1 1 The University of Tokyo 2 NEC Corporation

Upload: joann

Post on 24-Feb-2016

64 views

Category:

Documents


0 download

DESCRIPTION

Cache Replacement Policy Using Map-based Adaptive Insertion. Yasuo Ishii 1,2 , Mary Inaba 1 , and Kei Hiraki 1 1 The University of Tokyo 2 NEC Corporation. Introduction. CORE. Modern computers have multi-level cache system - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Cache Replacement Policy Using Map-based Adaptive Insertion

Cache Replacement Policy UsingMap-based Adaptive Insertion

Yasuo Ishii1,2, Mary Inaba1, and Kei Hiraki1

1 The University of Tokyo2 NEC Corporation

Page 2: Cache Replacement Policy Using Map-based Adaptive Insertion

IntroductionModern computers have multi-

level cache system

Performance improvement of LLC is the key to achieve high performance

LLC stores many dead-blocksElimination of dead-blocks in LLC improves system performance

CORE

L2

LLC(L3)

Memory

L1

Page 3: Cache Replacement Policy Using Map-based Adaptive Insertion

IntroductionMany multi-core systems

adopt shared LLC

Shared LLC make issuesThrashing by other threadsFairness of shared resource

Dead-block elimination is more effective for multi-core systems

Shared LLC(L3)

Memory

CORE1

L2

・・・・・・

L1

CORE2

L2

L1

COREN

L2

L1

・・・

Page 4: Cache Replacement Policy Using Map-based Adaptive Insertion

Trade-offs of Prior WorksReplacement Algorithm

Dead-blockElimination

AdditionalHW Cost

LRU Insert to MRU None NoneDIP[2007 Qureshi+]

Partially Random Insertion

Some Several counters

LightLRF[2009 Xiang+]

Predicts from reference pattern

Strong Shadow tag, PHT

Heavy

Problem of dead-block predictionInefficient use of data structure

(c.f. shadow tag)

Page 5: Cache Replacement Policy Using Map-based Adaptive Insertion

Map-based Data Structure

・・・・

・・・・Line Size

ACCESS

ACCESS

ACCESS

Shadow Tag

40bit/tag

Memory Address Space Zone Size

Map-based data structure improves cost- efficiency when there is spatial locality

Cost: 40bit/line

I I I IIIA AA

40bit/tag 1 bit/line

Map-base HistoryCost: 15.3bit/line (=40b+6b/3line)

Page 6: Cache Replacement Policy Using Map-based Adaptive Insertion

Map-based Adaptive Insertion (MAIP)Modifies insertion position(1)Cache bypass(2)LRU position(3)Middle of MRU/LRU(4)MRU position

Adopts map-based data structure for tracking many memory accesses

Exploits two localities for reuse possibility estimation

Low Reuse Possibility

High Reuse Possibility

Page 7: Cache Replacement Policy Using Map-based Adaptive Insertion

Hardware ImplementationMemory access map

Collects memory access history & memory reuse history

Bypass filter tableCollects data reuse

frequency of memory access instructions

Reuse possibility estimationEstimates reuse possibility

from information of other components

Estimation Logic

Mem

ory

Acce

ss

Map

Bypa

ss Fi

lter

Tabl

e

Last

Lev

el C

ache

Memory Access Information

Insertion Position

Page 8: Cache Replacement Policy Using Map-based Adaptive Insertion

Memory Access Map (1)

ACCESS

・・・・

・・・・ I I I I

Init Access

DataReuse

State Diagram

FirstTouchZone Size

Line Size

II

ACCESS

ACCESS

A AA

ACCESS

Detects one information(1)Data reuse

The accessed line is previously touched ?

Page 9: Cache Replacement Policy Using Map-based Adaptive Insertion

MapTag

AccessCount

ReuseCount

Memory Access Map (2)

A A I I

Init Access

ReuseCountAccess

Count

AI

Detects one statistics(2)Spatial locality

How often the neighboring lines are reused?

Access Map

Attaches counters to detect spatial locality

Data Reuse MetricReuse CountAccess Count

=

Page 10: Cache Replacement Policy Using Map-based Adaptive Insertion

Memory Access Map (3)Implementation

Maps are stored in cache like structure

Cost-efficiencyEntry has 256 statesTracks 16KB memory

16KB = 64B x 256stats

Requires ~ 1.2bit for tracking 1 cache line at the best case

Tag Access Map

CacheOffset

MapOffset

MapIndex

MapTag

= =ACCESS

MUX2563030

4

8

Memory Address

Count

Page 11: Cache Replacement Policy Using Map-based Adaptive Insertion

Reuse Count

Bypass Filter Table

Each entry is saturating counterCount up on data reuse / Count down on first

touch

Program Counter

Bypass Filter Table(8-bit x 512-entry)

BYPASSUSELESSNORMALUSEFULREUSE

Rarely Reused

Frequently Reused

Detects one statistic(3)Temporal locality:

How often the instruction reuses data?

Page 12: Cache Replacement Policy Using Map-based Adaptive Insertion

Reuse Possibility Estimation Logic

Uses 2 localities & data reuse informationData Reuse

Hit / Miss of corresponding lookup of LLC Corresponding state of Memory Access Map

Spatial Locality of Data Reuse Reuse frequency of neighboring lines

Temporal Locality of Memory Access Instruction Reuse frequency of corresponding instruction

Combines information to decide insertion policy

Page 13: Cache Replacement Policy Using Map-based Adaptive Insertion

Additional OptimizationAdaptive dedicated set reduction(ADSR)

Enhancement of set dueling [2007Qureshi+]

Reduces dedicated sets when PSEL is strongly biased

Set 7

LRU Dedicated Set

Set 6Set 5Set 4Set 3Set 2Set 1Set 0

Set 7Set 6Set 5Set 4Set 3Set 2Set 1Set 0

MAIP Dedicated SetAdditional FollowerFollower Set

Page 14: Cache Replacement Policy Using Map-based Adaptive Insertion

EvaluationBenchmark

SPEC CPU2006, Compiled with GCC 4.2Evaluates 100M instructions (skips 40G inst.)

MAIP configuration (per-core resource)Memory Access Map: 192 entries, 12-wayBypass Filter: 512 entries, 8-bit countersPolicy selection counter: 10 bit

Evaluates DIP & TADIP-F for comparison

Page 15: Cache Replacement Policy Using Map-based Adaptive Insertion

Cache Miss Count (1-core)

MAIP reduces MPKI by 8.3% from LRUOPT reduces MPKI by 18.2% from LRU

400.

perl

401.

bzip

429.

mcf

433.

milc

434.

zeus

436.

cact

437.

lesl

450.

sopl

456.

hmm

e45

9.Ge

ms

462.

libq

464.

h264

470.

lbm

471.

omne

473.

asta

481.

wrf

482.

sphi

483.

xala

Aver

age0

20

40

60

LRU DIP MAIP OPT

Miss

per

100

0 in

sts.

Page 16: Cache Replacement Policy Using Map-based Adaptive Insertion

Speedup (1-core & 4-core)

4-core result

403429433483

429450456482

401434456470

450464473483

401433450462

401450450482

403434450464

403456459473

434450482483

400429473483

400450456462

433434450462

433450470483

433434450462

400416456464

gmean

-6%

0%

6%

12%

18% TADIP MAIP

Wei

ghte

d Sp

eedu

p

400.

perl

401.

bzip

429.

mcf

433.

milc

434.

zeus

436.

cact

437.

lesl

450.

sopl

456.

...

459.

...

462.

libq

464.

h264

470.

lbm

471.

...

473.

asta

481.

wrf

482.

sphi

483.

xala

gmea

n

-10%

0%

10%

20% DIP MAIP

Spee

dup

1-core result

483.

xal

a

Page 17: Cache Replacement Policy Using Map-based Adaptive Insertion

Cost Efficiency of Memory Access Map

Requires 1.9 bit / line in average~ 20 times better than that of shadow tag

Covers >1.00MB(LLC) in 9 of 18 benchmarks

Covers >0.25MB(MLC) in 14 of 18 benchmarks

400.

perl

429.

mcf

434.

zeus

437.

lesl

456.

hmm

e

462.

libq

470.

lbm

473.

asta

482.

sphi

Aver

age0.0

0.5 1.0 1.5 2.0 2.5 3.0

Cove

red

Area

(MB)

Page 18: Cache Replacement Policy Using Map-based Adaptive Insertion

Related WorkUses spatial / temporal locality

Using spatial locality [1997, Johnson+]Using different types of locality [1995,

González+]Prediction-base dead-block elimination

Dead-block prediction [2001, Lai+]Less Reused Filter [2009, Xiang+]

Modified Insertion PolicyDynamic Insertion Policy [2007, Qureshi+]Thread Aware DIP[2008, Jaleel+]

Page 19: Cache Replacement Policy Using Map-based Adaptive Insertion

ConclusionMap-based Adaptive Insertion Policy

(MAIP)Map-base data structure

x20 cost-effectiveReuse possibility estimation exploiting

spatial locality & temporal locality Improves performance from LRU/DIP

Evaluates MAIP with simulation studyReduces cache miss count by 8.3% from LRUImproves IPC by 2.1% in 1-core, by 9.1% in

4-core

Page 20: Cache Replacement Policy Using Map-based Adaptive Insertion

ComparisonReplacement Algorithm

Dead-blockElimination

AdditionalHW Cost

LRU Insert to MRU None NoneDIP[2007 Qureshi+]

Partially Random Insertion

Some Several countersLight

LRF[2009 Xiang+]

Predicts from reference pattern

Strong Shadow tag, PHT

HeavyMAIP Predicts based

on two localities

Strong Mem access map

Medium

Improves cost-efficiency by map data structure

Improves prediction accuracy by 2 localities

Page 21: Cache Replacement Policy Using Map-based Adaptive Insertion

Q & A

Page 22: Cache Replacement Policy Using Map-based Adaptive Insertion

How to Detect Insertion Position

function is_bypass()

if(Sb = BYPASS) return true if(Ca > 16 x Cr) return true return false

endfunction

function get_insert_position()

integer ins_pos=15 if(Hm) ins_pos = ins_pos/2 if(Cr > Ca) ins_pos=ins_pos/2 if(Sb=REUSE) ins_pos=0 if(Sb=USEFUL) ins_pos=ins_pos/2 if(Sb=USELESS) ins_pos=15 return ins_pos

endfunction