roza ghamari bogazici university. current trends in transistor size, voltage, and clock frequency,...

24
Replication Cache: A Small Fully Associative Cache to Improve Data Cache Reliability By Wei Zhang, IEEE Member IEEE TRANSACTIONS ON COMPUTERS, VOL. 54, NO. 12, DECEMBER 2005 Roza Ghamari Bogazici University

Upload: emory-obrien

Post on 28-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Roza Ghamari Bogazici University.  Current trends in transistor size, voltage, and clock frequency, future microprocessors will become increasingly susceptible

Replication Cache: A Small Fully Associative Cache to Improve Data Cache ReliabilityBy Wei Zhang, IEEE MemberIEEE TRANSACTIONS ON COMPUTERS, VOL. 54, NO. 12, DECEMBER 2005

Roza GhamariBogazici University

Page 2: Roza Ghamari Bogazici University.  Current trends in transistor size, voltage, and clock frequency, future microprocessors will become increasingly susceptible

Why Fault-Tolerance in Cache?

Current trends in transistor size, voltage, and clock frequency, future microprocessors will become increasingly susceptible to transient hardware failures Cache memories are more vulnerable

Aggressive leakage control techniques over caches also have negative impact.

Cache soft errors can easily be propagated

1/22

Page 3: Roza Ghamari Bogazici University.  Current trends in transistor size, voltage, and clock frequency, future microprocessors will become increasingly susceptible

Outlines

1. Introduction

2. Replication Cache in detail

3. Schemes under Consideration

4. Evaluation Methodology

5. Results

6. Conclusion

7. References

2/22

Page 4: Roza Ghamari Bogazici University.  Current trends in transistor size, voltage, and clock frequency, future microprocessors will become increasingly susceptible

Introduction

Error Correcting Techniques: Single Error Correcting-Double Error

Detecting (SEC-DED)

Parity Check

N Modular Redundancy (NMR)

In-Cache Replication (I-CR)

3/22

Page 5: Roza Ghamari Bogazici University.  Current trends in transistor size, voltage, and clock frequency, future microprocessors will become increasingly susceptible

Introduction (Cont.)

Single Error Correcting-Double Error Detecting (SEC-DED)

Fundamental limitation in error detection and correction

Not capable of correcting double or more bit errors

Needs a read-modify-write cycle Impact performance Nontrivial energy overhead

4/22

Page 6: Roza Ghamari Bogazici University.  Current trends in transistor size, voltage, and clock frequency, future microprocessors will become increasingly susceptible

Introduction (Cont.)

Parity Check

Cannot detect any even bit errors No error correction

N Modular Redundancy (NMR) with Majority Voting Too expensive for microprocessors or

embedded systems with stringent cost and area constraints

5/22

Page 7: Roza Ghamari Bogazici University.  Current trends in transistor size, voltage, and clock frequency, future microprocessors will become increasingly susceptible

Introduction (Cont.)

In-Cache Replication (I-CR) :

Exploit “dead” blocks in the data cache to store the replicas for “hot” blocks

Nontrivial portion of data is unprotected not acceptable for applications demanding very high reliability

No perfect dead block predictor Performance overhead

Replicas overlap the data Performance degradation

6/22

Page 8: Roza Ghamari Bogazici University.  Current trends in transistor size, voltage, and clock frequency, future microprocessors will become increasingly susceptible

Introduction (Cont.)

Replication Cache:

Main Idea :using a small fully associative cache to store the replica for every write to the L1 data cache

Provide 100% loads with replica Has no impact on performance Much more area efficient

7/22

Page 9: Roza Ghamari Bogazici University.  Current trends in transistor size, voltage, and clock frequency, future microprocessors will become increasingly susceptible

Replication Cache in detail

A small fully associative cache in between the CPU and the L2 cache

Store the replicas for the “dirty” data in the L1 data cache

Address mapping is straightforward

In case of replication cache capacity misses some replicas may be written back to theL2 cache

8/22

CPU

L1 I-Cache L1 D-CacheR-Cache

L2 Chache

Memory

Page 10: Roza Ghamari Bogazici University.  Current trends in transistor size, voltage, and clock frequency, future microprocessors will become increasingly susceptible

Replication Cache in detail (cont.)

When do we replicate?

Replicate data when it is written from the processor (Replicate the “dirty” data )

Replicating the data in case of replica miss in the replication cache

How do we protect the primary data and replicas? maintaining a parity bit at byte granularity

( no performance penalty in the common case) 9/22

Page 11: Roza Ghamari Bogazici University.  Current trends in transistor size, voltage, and clock frequency, future microprocessors will become increasingly susceptible

Replication Cache in detail (cont.)

How do we replace the cache block if the replication cache is full? Discard the least recently used block Replicating the data in L2 in case of

replica miss (for applications that require full replication for the “dirty” data)

Use the LRU (Least-Recently-Used) algorithm for replica replacement

10/22

Page 12: Roza Ghamari Bogazici University.  Current trends in transistor size, voltage, and clock frequency, future microprocessors will become increasingly susceptible

Replication Cache in detail (cont.)

How many replicas do we need? Making multiple replicas within one

replication cache (much more area efficient)

How do we detect soft errors? Compare the data in L1 and its replica in

the replication cache in parallel loads take two cycles and stores take

one cycle11/22

Page 13: Roza Ghamari Bogazici University.  Current trends in transistor size, voltage, and clock frequency, future microprocessors will become increasingly susceptible

Replication Cache in detail (cont.)

How do we recover from soft errors?

not “dirty” Loading block from L2

“dirty” Using replicas in the replication cache for correcting errors

Soft errors in replica Using majority voting if multiple copies of the same data existed

12/22

Page 14: Roza Ghamari Bogazici University.  Current trends in transistor size, voltage, and clock frequency, future microprocessors will become increasingly susceptible

Schemes under Consideration

Base normal L1 data

cache without the replication cache

Parity protection

RC-P One replication

cache Parity protection In case of soft

errors, the replication cache is accessed

RC-C The replication

cache and the L1 data cache are searched in parallel and are compared with each other before the load returns

RC-2 Two replicas in the

replication cache for every write (majority voting)

13/22

Page 15: Roza Ghamari Bogazici University.  Current trends in transistor size, voltage, and clock frequency, future microprocessors will become increasingly susceptible

Schemes under Consideration (Cont.)

For all RC schemes conservatively assume two cycles for load operations and one cycle for store operations

RC-C and RC-2 schemes use parallel comparison to detect errors multi-bit error detection

Parallel comparison one cycle latency is hide if proceeding speculatively

14/22

Page 16: Roza Ghamari Bogazici University.  Current trends in transistor size, voltage, and clock frequency, future microprocessors will become increasingly susceptible

Evaluation Methodology

Evaluation Metrics

Execution Cycles : time taken for the execution of 200 million application instructions

Loads with Replica: the fraction of read hits having replicas in the replication cache

Implemented by modifying the “Simplesclar 3.0”

Eight applications from the SPEC 2000 for evaluation

15/22

Page 17: Roza Ghamari Bogazici University.  Current trends in transistor size, voltage, and clock frequency, future microprocessors will become increasingly susceptible

Results

Size of the Replication Cache

16/22

bizip2 equake0

0.2

0.4

0.6

0.8

1

1.2

2 4 8 16 32

Load

s w

ith

Rep

lica

Page 18: Roza Ghamari Bogazici University.  Current trends in transistor size, voltage, and clock frequency, future microprocessors will become increasingly susceptible

Results (Cont.)

17/22

verify the effectiveness of the replication cache

8K 16K 32K 64K 128K0.97

0.975

0.98

0.985

0.99

0.995

1

bzip2 load_with_replica bzip2 Hit Rate equake Load_with_replica equake Hit Rate

Page 19: Roza Ghamari Bogazici University.  Current trends in transistor size, voltage, and clock frequency, future microprocessors will become increasingly susceptible

Results (Cont.)

Comparison between schemes

18/22

bizip2 equake

gcc gzip mcf mesa vor-tex

vpr0

0.2

0.4

0.6

0.8

1

1.2

RC-P RC-2 ICR

Load

s w

ith

Rep

lica

Page 20: Roza Ghamari Bogazici University.  Current trends in transistor size, voltage, and clock frequency, future microprocessors will become increasingly susceptible

Results (Cont.)

Performance Comparison

19/22

bizip2

equake

gcc gzip mcf mesa vor-tex

vpr0

0.010.020.030.040.050.06

RC-P

Rep

licati

on

C

ach

e W

rite

-B

ack R

ate

Page 21: Roza Ghamari Bogazici University.  Current trends in transistor size, voltage, and clock frequency, future microprocessors will become increasingly susceptible

Results (Cont.)

Performance Comparison

20/22

bizip2

equake

gcc gzip mcf mesa vor-tex

vpr0.98

11.021.041.061.081.1

1.121.14

RC-C

Norm

alized

exe-

cu

tion

cycle

for

RC

-C

Page 22: Roza Ghamari Bogazici University.  Current trends in transistor size, voltage, and clock frequency, future microprocessors will become increasingly susceptible

Conclusion

A Fully Associative Replication CacheRC-P:

▪ Applications that only need parity-based protection

RC-C, RC-2:

▪ Applications require higher data integrity

▪ Applications operate under highly noisy environments

21/22

Page 23: Roza Ghamari Bogazici University.  Current trends in transistor size, voltage, and clock frequency, future microprocessors will become increasingly susceptible

References

[1]J. Ray، J.C. Hoe، and B. Falsafi، “Dual Use of Superscalar Datapath for Transient-Fault Detection and Recovery،” Proc.MICRO، Dec. 2001.

[2]W. Zhang، S. Gurumurthi، M. Kandemir، and A. Sivasubramaniam، “ICR: In-Cache Replication for Enhancing Data Cache Reliability،” Proc. Int’l Conf. Dependable Service and Networks (DSN)، 2003.

[3]V. Degalahal، N. Vijaykrishnan، and M.J Irwin، “Analyzing Soft Errors in Leakage Optimized SRAM Design،” Proc. VLSI Design Conf.، Jan. 2003.

22/22

Page 24: Roza Ghamari Bogazici University.  Current trends in transistor size, voltage, and clock frequency, future microprocessors will become increasingly susceptible

Thanks

1/22