a brief introduction to memory system proportionality and resilience · 2015-03-09 · a brief...

102
A brief introduction to Memory system proportionality and resilience Mattan Erez The University of Texas at Austin

Upload: others

Post on 23-May-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

A brief introduction to

Memory system proportionality and resilience

Mattan Erez

The University of Texas at Austin

Page 2: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•With support from

– NSF Award #0954107

(c) Mattan Erez

Page 3: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•The constraints:

– Power/energy

– Time

– Money

– Correctness

(c) Mattan Erez

Page 4: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Can’t work harder –

• have to work smarter

(c) Mattan Erez

Page 5: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Can’t work harder –

• have to work smarter

•Efficient & Proportional

(c) Mattan Erez

Page 6: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Multiple constraints and needs

• Heterogeneity

• asymmetry

• specialization

(c) Mattan Erez

Page 7: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•big.LITTLE and per core DvFS

– Static and dynamic asymmetry

•Accelerators

– Programmable and fixed-function

(c) Mattan Erez

Page 8: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Throughput cores

•Latency cores

•Specialized cores

proportionality of compute

(c) Mattan Erez

Page 9: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

Great, we solved the easy part: proportional compute

(c) Mattan Erez

Page 10: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Programming is hard

•Memory is hard

(c) Mattan Erez

Page 11: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•The memory challenge (outline)

– Why do memory systems look the way they do?

• Efficiency and reliability

• Abstractions for architects

– Role of heterogeneity and programs

– Some interesting mechanisms

– Where do we go from here?

(c) Mattan Erez

Page 12: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Storage device options abound

– SRAM

– DRAM

– FLASH

– STT-/M- RAM

– PCM, Memristor, RRAM, …

(c) Mattan Erez

Page 13: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Architects should think in abstractions

– Research should be general

(c) Mattan Erez

Page 14: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•The fundamentals

•Cheap

•Energy efficient

•High capacity

•Low latency

•High throughput

•Reliable

(c) Mattan Erez

Page 15: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Memory must be cheap

– Memory is already too expensive

• 15 – 50% of system cost (or more)

– But, margins razor thin (commodity)

(c) Mattan Erez

Page 16: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Memory must be energy efficient

– Memory accounts for 15 – 60+% of “compute” energy and getting worse

• More capacity

• Proportional compute

(c) Mattan Erez

Page 17: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Memory must be high capacity

– Memory footprints keep growing

• More interesting data

• More data

• More specialized data structures

(c) Mattan Erez

Page 18: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Memory must be high capacityMany chips

(c) Mattan Erez

Page 19: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Memory must be high capacityMany chips Denser mem,

Fewer chips

(c) Mattan Erez

Page 20: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Memory must be high capacityMany chips Denser mem,

Fewer chips

Many dice,

Fewer chips

© Micron

(c) Mattan Erez

Page 21: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Memory composed of

– Multiple modules

– Each module with N >=1 components

– Lots of cells per component

– Each cell >= 1 bit

(c) Mattan Erez

Page 22: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Memory should be low latency

– Latency == performance

– Except when sufficient concurrency

(c) Mattan Erez

Page 23: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Low latency conflicts with

– Cheap

– Capacity

– Sometimes with

• Throughput

• Energy

• Reliability

(c) Mattan Erez

Page 24: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Latency components:

– Memory cell access

– Data transfer

– Queuing

(c) Mattan Erez

Page 25: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Latency impacted by cell access

•denser/efficient higher latency

– Small cells sensitive circuits slow reads

– Writes even more complicated

+ latency

– density

– cost

– energy

– latency

+ density

+ cost

+ energy

(c) Mattan Erez

Page 26: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Caching to the rescue

– Store some data somewhere fast

(c) Mattan Erez

Page 27: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Short distances improve latency

– It’s all about the wires

(c) Mattan Erez

Page 28: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Short distances improve latency

– It’s all about the wires

+ latency

+ energy

– latency

– energy

(c) Mattan Erez

Page 29: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Short distances improve latency

– It’s all about the wires

+ latency

+ energy

– capacity

– latency

– energy

+ capacity

(c) Mattan Erez

Page 30: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Hierarchy to the rescue

(c) Mattan Erez

Page 31: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Hierarchy to the rescue

Processor

(c) Mattan Erez

Page 32: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Hierarchy to the rescue

Processor

(c) Mattan Erez

Page 33: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Memory system

– Modules, packages, components

– Arranged in a hierarchy

•Each memory component

– Has hierarchy

– Has caching/buffering

(c) Mattan Erez

Page 34: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Latency impacted by queuing

(c) Mattan Erez

Page 35: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Deep queues improve locality

– Higher throughput

– Higher efficiency

– Often hurts latency

(c) Mattan Erez

Page 36: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Memory must be high throughput

– More and more latency hiding

(c) Mattan Erez

Page 37: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Memory must be high throughputMany chips Fewer chips

High capacity per pin requires efficient access

(c) Mattan Erez

Page 38: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Packaging/interfaces improve throughput

– 3D integration

– Limited capacity

TSVSilicon

Interposer

(c) Mattan Erez

Page 39: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Packaging/interfaces improve throughput

– 3D integration

– Limited capacity

– More hierarchy levels

TSVSilicon

Interposer

Heterogeneity!

(c) Mattan Erez

Page 40: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Latency + throughput + efficiency parallelism

(c) Mattan Erez

Page 41: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Latency + throughput + efficiency parallelism

Access cell

(c) Mattan Erez

Page 42: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Latency + throughput + efficiency parallelism

Access many

cells in parallel

Many pins

in parallel

(c) Mattan Erez

Page 43: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Latency + throughput + efficiency parallelism

Access even more

cells in parallel

Many pins

in parallel over

multiple cycles

(c) Mattan Erez

Page 44: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Latency + throughput + efficiency parallelism

Access even more

cells in parallel

Many pins

in parallel over

multiple cycles

(c) Mattan Erez

Page 45: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•To recap

– Locality

– Hierarchy

– Parallelism

(c) Mattan Erez

Page 46: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Memory must be reliable

– Written data must be recalled “correctly”

(c) Mattan Erez

Page 47: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Reliability challenges

– “Natural” change in cell value

– Induced change in cell value

– Read errors

– Write errors

– Wearout and defects

(c) Mattan Erez

Page 48: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Natural causes of value changeP

rob

ab

ility

co

rre

ct

Time•Decay

– Refresh

(c) Mattan Erez

Page 49: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Natural causes of value changeP

rob

ab

ility

co

rre

ct

Time

Pro

ba

bili

ty c

orr

ec

t

Time•Decay

– Refresh

•Flip

– ECC + scrub

(c) Mattan Erez

Page 50: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Natural causes of value changeP

rob

ab

ility

co

rre

ct

Time

Pro

ba

bili

ty c

orr

ec

t

Time•Decay

– Refresh / scrub

•Flip

– ECC + scrub

(c) Mattan Erez

Page 51: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Natural causes of value changeP

rob

ab

ility

co

rre

ct

Time

Pro

ba

bili

ty c

orr

ec

t

Time

– Retention control

• Less dense

• Writes slower/higher-energy

• Can use different devices

•Decay

– Refresh / scrub

•Flip

– ECC + scrub

(c) Mattan Erez

Page 52: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Induced change in value

– Writing or reading disturbs nearby cells

– Worse for writes

– Technology and design dependent

(c) Mattan Erez

Page 53: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Read errors

– Because of small margins for efficiency

(c) Mattan Erez

Page 54: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Write errors

– Writing implies changing a state or value

– Stochastic

Pro

ba

bili

ty c

orr

ec

t

Time

(c) Mattan Erez

Page 55: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Write errors

– Writing implies changing a state or value

– Stochastic

– Waiting inefficient and slow

– Write error control conflicts w/ other errors

Pro

ba

bili

ty c

orr

ec

t

Time

(c) Mattan Erez

Page 56: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Wearout and defectsP

rob

ab

ility

co

rre

ct

Time

Pro

ba

bili

ty c

orr

ec

t

Time

Pro

ba

bili

ty c

orr

ec

t

Time

Pro

ba

bili

ty c

orr

ec

t

Time

(c) Mattan Erez

Page 57: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Wearout and defects

– Sparing

– Extra margins

– Retirement

– Compensation (ECC)

(c) Mattan Erez

Page 58: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Summary: no “good” memory

– It’s all about tradeoffs

(c) Mattan Erez

Page 59: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Example: DRAM

– Efficient and reliable reads and writes

– Leaky

– Density problematic

– Fairly fast reads and writes

– Parallelism determined by cost

– Fast decay (short retention)

• High variability

– Rare, but important flips

– Very rare disturbs

– Slow wearout

(c) Mattan Erez

Page 60: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Example: FLASH

– High-energy writes

– Very dense

– Slow reads

– Very slow writes

– Parallelism determined by technology

– Very slow decay (good retention)

– Flips only on periphery

– Write disturbs problematic

– Fast wearout

(c) Mattan Erez

Page 61: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Example: PCM

– High-energy writes

– Dense

– Fast reads

– Fast-ish writes

– Parallelism determined by cost

– Slow decay (OK retention)

– Flips only on periphery

– Write disturbs problematic

– Fast wearout

(c) Mattan Erez

Page 62: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Summary of heterogeneity:

• interrelated tradeoffs

– Efficiency

– Capacity

– Latency

– Bandwidth

– Parallelism (granularity)

– Reliability

(c) Mattan Erez

Page 63: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•The role of heterogeneous processors

• heterogeneous applications

(c) Mattan Erez

Page 64: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Proportional compute

– Latency oriented

– Throughput oriented

– Specialized

(c) Mattan Erez

Page 65: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Application characteristics

– Latency sensitive

– Bandwidth sensitive

– Few tasks or many tasks

(c) Mattan Erez

Page 66: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Application compute styles

– Batch

– Realtime

– Response time

(c) Mattan Erez

Page 67: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Application data usage

– Capacity

– Locality

– Precision

– Persistence

(c) Mattan Erez

Page 68: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Memory must be heterogeneousbut illusion of homogeneity is nice

(c) Mattan Erez

Page 69: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•The solution:Adaptive proportional memory systems

(c) Mattan Erez

Page 70: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•The solution:Adaptive proportional memory systems

(c) Mattan Erez

Page 71: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Adaptive reliability

(c) Mattan Erez

Page 72: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Adaptive reliability

– Adapt the mechanism

– Adapt the error rate

• precision (stochastic precision)

(c) Mattan Erez

Page 73: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Adaptive reliability

– By type

• Application guided

– By location (83)

• Machine-state guided

(c) Mattan Erez

Page 74: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Example: Virtualized ECC

– Adapt the mechanism

(c) Mattan Erez

Page 75: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

Physical Memory

Data ECC

Virtual Address space

LowVirtual Page to Physical Frame mapping

Page frame – x

Page frame – y

Page frame – z

Virtual page – x

Virtual page – y

Virtual page – z

Protection Level

Medium

High

(c) Mattan Erez

Page 76: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

Physical Memory

Data

Virtual Address space

LowVirtual Page to Physical Frame mapping

Physical Frame to ECC Page mapping

ECC

StrongerECC

Page frame – x

Page frame – y

Page frame – z

ECC page – y

ECC page – z

Virtual page – x

Virtual page – y

Virtual page – z

Protection Level

Medium

High

(c) Mattan Erez

Page 77: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

Physical Memory

Data T1EC

Virtual Address space

LowVirtual Page to Physical Frame mapping

Physical Frame to ECC Page mapping

T2EC for Chipkill

T2EC for DoubleChipkill

Page frame – x

Page frame – y

Page frame – z

ECC page – y

ECC page – z

Virtual page – x

Virtual page – y

Virtual page – z

Protection Level

Medium

High

(c) Mattan Erez

Page 78: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

Two-Tiered ECC

Write Read

T1ECDecoding

Data T1EC

T1EC/T2ECEncoding

T2EC

(c) Mattan Erez

Page 79: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

(c) Mattan Erez

Write Read

T1ECDecoding

Data T1EC

Virtualized

T2EC

T1EC/T2ECEncoding

T2ECMapping

Page 80: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Caching work great

0.98

0.99

1.00

1.01

1.02

Chipkill Detect ChipkillCorrect

2 ChipkillCorrect

Normalized Execution Time

(c) Mattan Erez

Page 81: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Bonus: flexibility of ECC

0.75

0.80

0.85

0.90

0.95

1.00

Chipkill Detect Chipkill Correct 2 ChipkillCorrect

Normalized Energy-Delay Product

(c) Mattan Erez

Page 82: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Example: FREE-p

– Adapt to wearout state

(c) Mattan Erez

Page 83: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•FREE-p insights:

– Wearout is gradual

– Adapt ECC strength to wearout degree

– Limit ECC need with adaptive repair

• Retire and remap

(c) Mattan Erez

Page 84: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Adapt ECC strength

– Multi-tiered ECC

(c) Mattan Erez

Page 85: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Different cells need different protection

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

0.0 0.9 1.9 2.8 3.7 4.6 5.6 6.5 7.4 8.4 9.3

Years

slow-ECC (4 failures)

slow-ECC (3 failures)

quick-ECC (2 failures)

quick-ECC (1 failure)

quick-ECC (no failure)

(c) Mattan Erez

Page 86: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Fine-grain retirement is key

0.0

0.2

0.4

0.6

0.8

1.0

1.2

0 2 4 6 8 10 12 14 16

No

rmali

zed

Cap

acit

y

Year

No Correction

SEC64

ECP6

FREE-p

(c) Mattan Erez

Page 87: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Adaptive FG repair

Page for remappingD/P flag

Data page

ptr

data

(c) Mattan Erez

Page 88: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Impact of repair and remap is gradual

0

8

16

24

32

40

48

56

64

0 2 4 6 8 10

Fau

lty b

locks /

pag

e

Year

(c) Mattan Erez

Page 89: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Adaptive granularity (parallelism)

(c) Mattan Erez

Page 90: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Applications have diverse locality

0%

25%

50%

75%

100%

GU

PS

ca

nn

ea

l

linke

d li

st

mst

em

3d

SSC

A2

ast

ar

om

ne

tpp

bzi

p2

mc

f

lbm

libq

ua

ntu

m

OC

EA

N

stre

am

clu

ste

r

hm

me

r

STR

EA

M

Low spatial locality High spatial locality~50% is referenced

1~2 3~6 7~8

(c) Mattan Erez

Page 91: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Coarse- or fine-grained memory access?

CG-Only: Wide DRAM channel

FG-Only: Many narrow channels

ABUS

x8 x8 x8

DBUS

ABUS

x8 x8 x8

DBUS

ABUS

x8 x8 x8

DBUS

ABUS

x8 x8 x8

DBUS

ABUS

DBUS

x8 x8 x8 x8 x8 x8 x8 x8 x8

(c) Mattan Erez

Page 92: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Coarse- or fine-grained memory access?

D0 E0

DataC

C0

C1

D1 E1 D2 E2 D3 E3C2

C3

C4

D4 E4

CG

FGC5

C6

C7

D5 E5 D6 E6 D7 E7

ECC

D0 E0

DataC

C0

CG

FG

ECC

(c) Mattan Erez

Page 93: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

Dynamically adapt granularity best of both worlds

ProcessorCore

$I$D

L2

Last Level Cache

MC

DRAM

Core0

Core1

CoreN-1

SLP

Co-scheduling CG/FG w/ split CG

Sector cache (8 8B sectors per line)

Sub-Ranked DRAM w/ 2X ABUS Support both CG and FG

Locality Predictor

(c) Mattan Erez

Page 94: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Subranked memory

– Adapt parallelism

(c) Mattan Erez

DBUS

x8 x8 x8 x8 x8 x8 x8 x8 x8

ABUS

SR0 SR1 SR2 SR3 SR4 SR5 SR6 SR7 SR8

Reg/D

em

ux

Page 95: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Sectored caches

– Track granularity

(c) Mattan Erez

tag sector 0D V sector 1D V sector 2D V sector 3D V

tag dataD V

Long cache line

Sector cache

tag dataD V

Short cache line

Page 96: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Granularity information?

– Application provided (when allocated)

– System learned (when accessed)

(c) Mattan Erez

Page 97: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

0.0

1.0

2.0

3.0

4.0

5.0

6.0

7.0

8.0

mcf x8 omnetpp

x8

mst x8 em3d x8 canneal

x8

SSCA2 x8 linked list

x8

gups x8 MIX-A MIX-B MIX-C MIX-D

Sy

ste

m T

hro

ug

hp

ut

CG+ECC

FG+ECC

AG+ECC

•Dynamically adapt granularity

•PROPORTIONALITY

(c) Mattan Erez

Page 98: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Memory is heterogeneous

•Applications are heterogeneous

• this is a good thing

(c) Mattan Erez

Page 99: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Adaptivity is key

(c) Mattan Erez

Page 100: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Sometimes need application help

– Programming models and analyses crucial

(c) Mattan Erez

Page 101: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Think abstractions rather than examples

(c) Mattan Erez

Page 102: A brief introduction to Memory system proportionality and resilience · 2015-03-09 · A brief introduction to Memory system proportionality and resilience Mattan Erez The University

•Memory is heterogeneous

•Applications are heterogeneous

• this is a good thing

•Adaptivity is key

•Sometimes need application help

– Programming models and analyses crucial

•Think abstractions rather than examples

(c) Mattan Erez