architectural techniques to enhance dram …safari/thesis/ykim_defense_slides.pdf“row-buffer”...

85
ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM SCALING Thesis Defense Yoongu Kim

Upload: others

Post on 11-Mar-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

ARCHITECTURAL TECHNIQUES

TO ENHANCE DRAM SCALING

Thesis Defense

Yoongu Kim

Page 2: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

2

CPU+CACHE

MAIN MEMORY

STORAGE

Page 3: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

3

Complex Problems

Large Datasets

High Throughput

Page 4: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

4

DRAM Cell(Capacitor)

DRAM Module

DRAM Chip ‘0’‘1’

Page 5: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

5

1971 2015

103 Cells 109 Cells

FIRSTDRAM CHIP

Page 6: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

FUTU

RE?

6

1971 2015

103

109

Page 7: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

DRAM SCALING

7

TECHNOLOGICAL FEASIBILITYCan we make smaller cells?

ECONOMIC VIABILITYShould we make smaller cells?

Page 8: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

8

SMALLERCELLS

LOWERCOST/BIT

1. RELIABILITY TAX

2. PERFORMANCE TAX

Page 9: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

1. RELIABILITY

9

COUPLING BETWEEN

NEARBY CELLS

ROW HAMMER (ISCA 2014)

Your DRAM chips are probably broken.

Page 10: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

2. PERFORMANCE

10

ABNORMALLY SLOW

OUTLIER CELLS

BANK CONFLICTS (ISCA 2012)

Our solution may be adopted by industry.

Page 11: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

11

CO-DESIGN: CPU & DRAM

CPUMemory

Management

DRAMController

DRAMArch &

Interface

Circuits &Devices

Page 12: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

THESIS STATEMENT

The degradation in DRAM

reliability & performance can be

effectively mitigated by making

low-overhead, non-intrusive

modifications to the DRAM chips

and the DRAM controller.

12

Page 13: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

13

DRAMSCALING

ARCHITECTURALSUPPORT

MAIN MEMORY:LARGER, FASTER,

RELIABLE, EFFICIENT

Page 14: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

THREE CONTRIBUTIONS

1. We show that DRAM scaling is negatively affecting reliability.

We expose a new type ofDRAM failure, and propose a

cost-effective way to address it.

14

Page 15: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

THREE CONTRIBUTIONS

2. We propose a high-performancearchitecture for DRAM that mitigates its growing latency.

We identify bottlenecks in DRAM’s internal design and alleviate them

in a cost-effective manner.

15

Page 16: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

THREE CONTRIBUTIONS

3. We develop a new simulator for facilitating rapid design space exploration of DRAM.

The simulator is the fastest,while also being easy to modify

due to its modular design.

16

Page 17: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

OUTLINE

17

1. RELIABILTY: ROW HAMMER

2. PERF: BANK CONFLICT

3. SIMULATOR: RAMULATOR

4. CONCLUSION

Page 18: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

18

1. ROW HAMMER

FLIPPING BITS IN MEMORYWITHOUT ACCESSING THEM

ISCA 2014

Page 19: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

19

DRAM CHIPWORDLINE

ROW LOW VOLTAGEHIGH VOLTAGEVICTIM

VICTIMAGGRESSOR

READ DATA FROM HERE, GET ERRORS OVER THERE

Page 20: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

20

GOOGLE’S EXPLOIT

http://googleprojectzero.blogspot.com

“We learned about

rowhammer from

Yoongu Kim et al.”

Page 21: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

21

GOOGLE’S EXPLOIT

PROPOSEDSOLUTIONS

OUR PROOF-OF-CONCEPT

EMPIRICALANALYSIS

Page 22: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

REAL SYSTEM

22

x86 DRAM

1. CACHE HITS 2. ROW HITS

MANY READS TO

SAME ADDRESS

OPEN/CLOSE

SAME ROW≠

Page 23: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

23

LOOP:

mov (X), %regmov (Y), %regclflush (X)clflush (Y)jmp LOOP

11111111111111111111111111111111111111111111

Y

X

11111111111

1111

1111

11011110010

10111010111

x86 CPU DRAM

http://www.github.com/CMU-SAFARI/rowhammer

MANYERRORS!

Page 24: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

24

WHY DO THE

ERRORS OCCUR?

Page 25: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

25

DRAM CELLS ARE LEAKYCH

AR

GE

‘1’

‘0’

64ms0msR

EFR

ESHNORMAL CELL

TIME

Page 26: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

26

DRAM CELLS ARE LEAKYCH

AR

GE

‘1’

‘0’

64ms0msR

EFR

ESH

VICTIM CELL

AGGRESSOR

TIME

Page 27: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

27

COUPLING• Electromagnetic

• Tunneling

ROOT CAUSE?

⇝ ⇝⇝⇝⇝ ⇝⇝⇝

ACCELERATES CHARGE LOSS

Page 28: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

AS DRAM SCALES …

• CELLS BECOME SMALLER Less tolerance to coupling effects

• CELLS BECOME PLACED CLOSERStronger coupling effects

COUPLING ERRORS MORE LIKELY28

Page 29: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

29

1. ERRORS ARE RECENTNot found in pre-2010 chips

2. ERRORS ARE WIDESPREAD>80% of chips have errors

Up to one error per ~1K cells

Page 30: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

Test Engine

DRAM CtrlPCIe

FPGAPC

30

DRAM

Page 31: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

TemperatureController

PCHeater

FPGAs FPGAs

Page 32: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

MOST MODULES AT RISK

32

A VENDOR

B VENDOR

C VENDOR

86%

83%

88%

(37/43)

(45/54)

(28/32)

Page 33: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

33

MANUFACTURE DATE

ERRORS PER 109 CELLS

MODULES: A B C

Page 34: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

DISTURBING FACTS

•AFFECTS ALL VENDORSNot an isolated incidentDeeper issue in DRAM scaling

•UNADDRESSED FOR YEARSCould impact systems in the field

34

Page 35: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

35

HOW TO PREVENT

COUPLING ERRORS?

Previous Approaches1. Make Better Chips: Expensive

2. Rigorous Testing: Takes Too Long

Page 36: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

36

ROWMORE ERRORS

ROWMORE ERRORS

FASTER ACCESS

ROW

ROWFEWER ERRORS

ROWFEWER ERRORS

FREQUENT REFRESH

ROW

Page 37: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

Faster ⟶ Slower

37

ONE MODULE: A B C

ACCESS INTERVAL (ns)

TOTALERRORS

55

ns

50

0n

s

64ms

𝟓00ns= 𝟏𝟐𝟖𝐊

Page 38: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

Often ⟵ Seldom

38

ONE MODULE: A B C

REFRESH INTERVAL (ms)

TOTALERRORS

64

ms

11

ms

11ms

𝟓5ns= 𝟐𝟎𝟎𝐊

Page 39: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

1. LIMIT ACCESSES TO ROWAccess Interval > 500ns

2. REFRESH ALL ROWS OFTENRefresh Interval < 11ms

39

TWO NAIVE SOLUTIONS

LARGE OVERHEAD: PERF, ENERGY, COMPLEXITY

Page 40: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

OUR SOLUTION: PARRProbabilistic Adjacent Row Refresh

40

Do nothing Refresh (=Open) adjacent rows

After closing any row ...

0.1%99.9%

Page 41: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

PARR: CHANCE OF ERROR

• NO REFRESHES IN N TRIALSProbability: 0.999N

• N=128K FOR ERROR (64ms)Probability: 0.999128K = 10–56

STRONG RELIABILITY GUARANTEE

41

Page 42: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

42

STRONG RELIABILITY

LOW PERFOVERHEAD

9.4× 10–14

Errors/Year

0.20%Slowdown

NO STORAGEOVERHEAD 0 Bytes

Page 43: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

RELATED WORK

• Security Exploit (Seaborn@Google 2015)

• Industry Analysis (Kang@SK Hynix 2014)“... will be [more] severe as technology shrinks down.”

• Targeted Row Refresh (JEDEC 2014)

• DRAM Testing (e.g., Van de Goor+ 1999)

• Disturbance in Flash & Hard Disk

43

Page 44: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

44

RECAP: RELIABILITY

CPU

MemoryManagement

DRAM

Arch &Interface

Circuits &Devices

DRAMControllerPARR ROW

HAMMER

Page 45: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

45

2. BANK CONFLICTS

A CASE FOR SUBARRAY PARALLELISM

ISCA 2012

Page 46: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

46

CUTOFF

FASTEST SLOWEST

DRAM

CELLS

5⨉ “WRITE PENALTY”Source: Samsung & Intel. The Memory Forum 2014

Page 47: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

47

FIGHT LATENCY

WITH MORE

PARALLELISM

Page 48: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

48

BANK

CHIP

BANKCPU

WR

WR

DIFFERENT BANKS:SMALL LATENCY

Page 49: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

49

BANK

CHIP

BANKCPU

WR WR

BANK CONFLICT:LARGE LATENCY

Page 50: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

50

BANK CONFLICT LATENCY

WR WR

SERIALIZATION

WRITE PENALTY (GETTING WORSE)

TIME

“ROW-BUFFER” THRASHING

Page 51: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

51

BANK

BITLINES

ROWROWROWROW

CACHES A COPY OF OPENED ROW

CONNECTS ROWS TO ROW-BUFFER

BUFFER

MEMORY REQUEST

DATA

MEMORY REQUESTMEMORY REQUESTMEMORY REQUEST

Page 52: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

52

HOW TO

PARALLELIZE

BANK CONFLICTS?Previous Approaches1. Have More Banks: Expensive

2. Bank Interleaving: Non-Solution

Page 53: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

53

BUFFER

BANK (LOGICAL VIEW)

ROWROW

ROWROW

MANY ROWS(~100K)

JUST ONE

CANNOT DRIVE LONG BITLINES

Page 54: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

54

ROWROW

ROWROW

BUFFER

FEWER ROWS (~1K)

BANK (PHYSICAL VIEW)

DEC

DRSUB

ARRAY

DEC

OD

ER

LOCAL (SUBARRAY)

GLOBAL (BANK)

BUFFER

BUFFER

DEC

DR

Page 55: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

55

ROWROW

BUFFER

SHARING = ROOT OF EVIL

DEC

DRSUB

ARRAY

DEC

OD

ER

BUFFER

SHARED:1. GLOBAL DEC.2. GLOBAL BUF.

NO SUBARRAYPARALLELISM

ROWROW

BUFFERDEC

DR

Page 56: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

56

ROW2ROW1

BUFFER

ROW4D

ECD

RROW3

BUFFERDEC

DR

PROBLEM #1: DECODER

ADDR1ADDR3 BOTHD

ECO

DER

ROW1OPENEDROW1CLOSED

ROW3OPENED

Page 57: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

57

ROW1

BUFFER

REG

.

ROW2

DEC

DRROW3

BUFFERROW4

DEC

DR

OUR SOLUTION

ADDR3ADDR1D

ECO

DER

STORES ADDRESS FOR EACH SUBARRAY

BOTHSTILLOPEN

ALSOOPEN

Page 58: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

58

ROW2ROW1

PROBLEM #2: BUFFER

BUFFER

ROW4ROW3

BUFFER

BUFFER

DEC

DR

DEC

DR

DEC

OD

ER

BOTH

Page 59: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

59

ROW2ROW1

OUR SOLUTION

BUFFER

ROW4ROW3D

ECO

DER

BOTH

REG

. SELECTSSUBARRAY FOR ACCESS

BUFFER

BUFFERDEC

DR

DEC

DR

Page 60: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

60

WR

SERIALSUBARRAYS

WR

WR

PARALLELSUBARRAYS

WR WR

WR

TIME

TIME

Page 61: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

61

0%

5%

10%

15%

20%

1x 2x 4x 8x 16x 32x 64x 128x

SP

EED

UP

SUBARRAY PARALLELISM

• Simulation: Out-of-Order CPU + DDR3-1066

• Benchmarks: SPEC/TPC/STREAM/RANDOM

NO

NPA

RA

LLEL

PA

RA

LLEL

+17%

Page 62: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

62

SUBARRAYS BANKS

PARALLELISM8x 8x

SPEEDUP+17% +20%

vs.

CHIP-SIZE+0.2% +36%Estimated using DRAM area model from Rambus

Page 63: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

RELATED WORK

• Module Partitioning (e.g., Zheng+ 2008)Divide module into small, independent subsets

Narrower data-bus Higher unloaded latency

• Hierarchical Bank (Yamauchi+ 1997)Parallelizes accesses to different subarrays

Does not utilize multiple local row-buffers

• Cached DRAM (e.g., Hidaka+ 1990)

• Low-Latency DRAM (e.g., Sato+ 1998)

63

Page 64: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

64

RECAP: PERFORMANCE

CPU

MemoryManagement

DRAM

Arch &Interface

Circuits &Devices

DRAMController

SUBARRAYAWARE

SLOWOUTLIERS

PARALLELSUBARRAYS

Page 65: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

65

3. RAMULATOR

RAMULATOR:A FAST AND EXTENSIBLE

DRAM SIMULATORIEEE CAL 2015

Page 66: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

66

IDEA

IDEAIDEA NEW IDEAS FOR

BETTER DRAM ...

DRAMSIMULATOR

MUST BE VETTED BY SIMULATION

Page 67: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

PREVIOUS SIMULATORS

•PRO: HIGH FIDELITY– Cycle-accurate DRAM models

• CON: LOW FLEXIBILITY– Hardcoded for DDR3/DDR4 DRAM

– Difficult to extend to othersE.g., LPDDRx, WIOx, GDDRx, RLDRAMx, HBM, HMC

67

Page 68: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

OUR RAMULATOR

•BUILT FOR EXTENSIBILITY– Easy to incorporate new ideas

•HIGH SIMULATION SPEED– 2.5x faster than the next fastest

•PORTABLE: SIMPLE C++ API68

Page 69: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

RAMULATOR’S APPROACH

69

BankBankBankBANK

RANK

CHANNEL

DRAM SYSTEM(STATE MACHINES)

BankBankBankBANK

RANK

CHANNELDRAM

COMMAND& ADDRESS

(INPUTS)

Page 70: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

70

BANKX

Z Y

RANK

CHANNELA

C

B

D

P Q

TEMPLATE

?

?

??

STATE MACHINESARE DEFINED BY:• STATES• EDGES

RAMULATOR:RECONFIGURABLESTATE MACHINE

Page 71: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

71

TEMPLATE TEMPLATE

TEMPLATE

TREE OFSTATE MACHINES

COMMAND& ADDRESS

• Hierarchy

• States

• EdgesDDR4

• Hierarchy

• States

• Edges

GDDR5

• Hierarchy

• States

• EdgesHBM

TEMPLATE TEMPLATE

TEMPLATE

Page 72: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

72

TMP TMP

TMP TMP TMP

TMP

DDR4

GDDR5

HBM

LPDDR4

WIO2DRAM CONTROLLER

DRAMTRACE

CPUFRONTEND

FULLSYSTEM

RAMULATOR• PERFORMANCE

• POWER (for some)

http://www.github.com/CMU-SAFARI/ramulator

Page 73: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

73

SupportedDRAM Specifications

SimulationSpeed (DDR3)

RamulatorDDR3/4, LPDDR3/4,

GDDR5, HBM, WIO1/2, Subarray Parallelism, etc.

2.70x

DRAMSim2(Rosenfeld et al.)

DDR2/3 1x

USIMM(Chatterjee et al.)

DDR3 1.08x

DrSim(Jeong et al.)

DDR2/3, LPDDR2 0.11x

NVMain(Poremba and Xie)

DDR3, LPDDR3/4 0.30x

Page 74: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

74

4. CONCLUSION

Page 75: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

SUMMARY

75

Traditional DRAM Scaling at Risk

Architectural Techniques for

Coping with Degrading DRAM

Regain Reliability & Performance

Page 76: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

CONTRIBUTIONS

1. We show that DRAM scaling is

negatively affecting reliability.

2. We propose a high-performance,

cost-effective DRAM architecture.

3. We develop a new simulator for

facilitating DRAM research.76

Page 77: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

77

RELIABILITY

• Row HammerKim et al., ISCA ’14

• Retention FailuresLiu et al., ISCA ’13Khan et al., SIGMETRICS ’14

• Speed vs. ReliabilityLee et al., HPCA ’15

EFFICIENCY

• Bank ConflictsKim et al., ISCA ’12

• In-DRAM Page CopySeshadri et al., MICRO ’13

• Tiered LatencyLee et al., HPCA ’13

• Fine-Grained RefreshesChang et al., HPCA ’14

RESEARCH

Page 78: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

78

COMPRESSION• Simple & Effective

Pekhimenko et al., MICRO ’13

SCHEDULING• High Throughput

Kim et al., HPCA ’10

• Quality of ServiceKim et al., MICRO ’10Kim et al., IEEE Micro Top Picks ’11Subramanian et al., HPCA ’13

SIMULATION• Fast & Extensible

Kim et al., IEEE CAL ’15

RESEARCH

Page 79: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

OUTDATED ASSUMPTIONS

•DRAM SCALING WILL TAKE CARE OF MAIN MEMORY

•LATEST AND CHEAPEST DRAM IS THE GREATEST

79

Page 80: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

NEW TECHNOLOGIES

80

NONVOLATILEMEMORY (NVM)

3-DIMENSIONAL

DIE STACKING

Image: Loke et al., Science 2012

Image: Micron Technology, 2015

Page 81: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

TREND: HETEROGENEITY

81

CACHE:

MAINMEMORY:

STORAGE:

SRAM

FLASH/HDD

DRAM

eDRAM3D DRAM

NVM

Page 82: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

TREND: SPECIALIZATION

82

MOBILE

GRAPHICS

NETWORKS

PC/SERVER

BANDWIDTH

POWER

LATENCY

COST

Page 83: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

83

HOW TO

REFORMULATE THE

MEMORY HIERARCHY?

Page 84: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

•NEW SOFTWARE ABSTRACTIONSPrimitives for managing heterogeneity

•NEW HARDWARE TRADE-OFFSDefining appropriate roles for each tier

• RICH MEMORY CAPABILITIES Memory as more than a “bag of bits”

84

Page 85: ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM …safari/thesis/ykim_defense_slides.pdf“row-buffer” thrashing ... decoder array local (subarray) global (bank) buffer buffer decdr. 55

ARCHITECTURAL TECHNIQUES

TO ENHANCE DRAM SCALING

Thesis Defense

Yoongu Kim