a brief introduction to memory system proportionality and resilience · 2015-03-09 · a brief...
TRANSCRIPT
A brief introduction to
Memory system proportionality and resilience
Mattan Erez
The University of Texas at Austin
•With support from
– NSF Award #0954107
(c) Mattan Erez
•The constraints:
– Power/energy
– Time
– Money
– Correctness
(c) Mattan Erez
•Can’t work harder –
• have to work smarter
(c) Mattan Erez
•Can’t work harder –
• have to work smarter
•Efficient & Proportional
(c) Mattan Erez
•Multiple constraints and needs
• Heterogeneity
• asymmetry
• specialization
(c) Mattan Erez
•big.LITTLE and per core DvFS
– Static and dynamic asymmetry
•Accelerators
– Programmable and fixed-function
(c) Mattan Erez
•Throughput cores
•Latency cores
•Specialized cores
proportionality of compute
(c) Mattan Erez
Great, we solved the easy part: proportional compute
(c) Mattan Erez
•Programming is hard
•Memory is hard
(c) Mattan Erez
•The memory challenge (outline)
– Why do memory systems look the way they do?
• Efficiency and reliability
• Abstractions for architects
– Role of heterogeneity and programs
– Some interesting mechanisms
– Where do we go from here?
(c) Mattan Erez
•Storage device options abound
– SRAM
– DRAM
– FLASH
– STT-/M- RAM
– PCM, Memristor, RRAM, …
(c) Mattan Erez
•Architects should think in abstractions
– Research should be general
(c) Mattan Erez
•The fundamentals
•Cheap
•Energy efficient
•High capacity
•Low latency
•High throughput
•Reliable
(c) Mattan Erez
•Memory must be cheap
– Memory is already too expensive
• 15 – 50% of system cost (or more)
– But, margins razor thin (commodity)
(c) Mattan Erez
•Memory must be energy efficient
– Memory accounts for 15 – 60+% of “compute” energy and getting worse
• More capacity
• Proportional compute
(c) Mattan Erez
•Memory must be high capacity
– Memory footprints keep growing
• More interesting data
• More data
• More specialized data structures
(c) Mattan Erez
•Memory must be high capacityMany chips
(c) Mattan Erez
•Memory must be high capacityMany chips Denser mem,
Fewer chips
(c) Mattan Erez
•Memory must be high capacityMany chips Denser mem,
Fewer chips
Many dice,
Fewer chips
© Micron
(c) Mattan Erez
•Memory composed of
– Multiple modules
– Each module with N >=1 components
– Lots of cells per component
– Each cell >= 1 bit
(c) Mattan Erez
•Memory should be low latency
– Latency == performance
– Except when sufficient concurrency
(c) Mattan Erez
•Low latency conflicts with
– Cheap
– Capacity
– Sometimes with
• Throughput
• Energy
• Reliability
(c) Mattan Erez
•Latency components:
– Memory cell access
– Data transfer
– Queuing
(c) Mattan Erez
•Latency impacted by cell access
•denser/efficient higher latency
– Small cells sensitive circuits slow reads
– Writes even more complicated
+ latency
– density
– cost
– energy
– latency
+ density
+ cost
+ energy
(c) Mattan Erez
•Caching to the rescue
– Store some data somewhere fast
(c) Mattan Erez
•Short distances improve latency
– It’s all about the wires
(c) Mattan Erez
•Short distances improve latency
– It’s all about the wires
+ latency
+ energy
– latency
– energy
(c) Mattan Erez
•Short distances improve latency
– It’s all about the wires
+ latency
+ energy
– capacity
– latency
– energy
+ capacity
(c) Mattan Erez
•Hierarchy to the rescue
(c) Mattan Erez
•Hierarchy to the rescue
Processor
(c) Mattan Erez
•Hierarchy to the rescue
Processor
(c) Mattan Erez
•Memory system
– Modules, packages, components
– Arranged in a hierarchy
•Each memory component
– Has hierarchy
– Has caching/buffering
(c) Mattan Erez
•Latency impacted by queuing
(c) Mattan Erez
•Deep queues improve locality
– Higher throughput
– Higher efficiency
– Often hurts latency
(c) Mattan Erez
•Memory must be high throughput
– More and more latency hiding
(c) Mattan Erez
•Memory must be high throughputMany chips Fewer chips
High capacity per pin requires efficient access
(c) Mattan Erez
•Packaging/interfaces improve throughput
– 3D integration
– Limited capacity
TSVSilicon
Interposer
(c) Mattan Erez
•Packaging/interfaces improve throughput
– 3D integration
– Limited capacity
– More hierarchy levels
TSVSilicon
Interposer
Heterogeneity!
(c) Mattan Erez
•Latency + throughput + efficiency parallelism
(c) Mattan Erez
•Latency + throughput + efficiency parallelism
Access cell
(c) Mattan Erez
•Latency + throughput + efficiency parallelism
Access many
cells in parallel
Many pins
in parallel
(c) Mattan Erez
•Latency + throughput + efficiency parallelism
Access even more
cells in parallel
Many pins
in parallel over
multiple cycles
(c) Mattan Erez
•Latency + throughput + efficiency parallelism
Access even more
cells in parallel
Many pins
in parallel over
multiple cycles
(c) Mattan Erez
•To recap
– Locality
– Hierarchy
– Parallelism
(c) Mattan Erez
•Memory must be reliable
– Written data must be recalled “correctly”
(c) Mattan Erez
•Reliability challenges
– “Natural” change in cell value
– Induced change in cell value
– Read errors
– Write errors
– Wearout and defects
(c) Mattan Erez
•Natural causes of value changeP
rob
ab
ility
co
rre
ct
Time•Decay
– Refresh
(c) Mattan Erez
•Natural causes of value changeP
rob
ab
ility
co
rre
ct
Time
Pro
ba
bili
ty c
orr
ec
t
Time•Decay
– Refresh
•Flip
– ECC + scrub
(c) Mattan Erez
•Natural causes of value changeP
rob
ab
ility
co
rre
ct
Time
Pro
ba
bili
ty c
orr
ec
t
Time•Decay
– Refresh / scrub
•Flip
– ECC + scrub
(c) Mattan Erez
•Natural causes of value changeP
rob
ab
ility
co
rre
ct
Time
Pro
ba
bili
ty c
orr
ec
t
Time
– Retention control
• Less dense
• Writes slower/higher-energy
• Can use different devices
•Decay
– Refresh / scrub
•Flip
– ECC + scrub
(c) Mattan Erez
•Induced change in value
– Writing or reading disturbs nearby cells
– Worse for writes
– Technology and design dependent
(c) Mattan Erez
•Read errors
– Because of small margins for efficiency
(c) Mattan Erez
•Write errors
– Writing implies changing a state or value
– Stochastic
Pro
ba
bili
ty c
orr
ec
t
Time
(c) Mattan Erez
•Write errors
– Writing implies changing a state or value
– Stochastic
– Waiting inefficient and slow
– Write error control conflicts w/ other errors
Pro
ba
bili
ty c
orr
ec
t
Time
(c) Mattan Erez
•Wearout and defectsP
rob
ab
ility
co
rre
ct
Time
Pro
ba
bili
ty c
orr
ec
t
Time
Pro
ba
bili
ty c
orr
ec
t
Time
Pro
ba
bili
ty c
orr
ec
t
Time
(c) Mattan Erez
•Wearout and defects
– Sparing
– Extra margins
– Retirement
– Compensation (ECC)
(c) Mattan Erez
•Summary: no “good” memory
– It’s all about tradeoffs
(c) Mattan Erez
•Example: DRAM
– Efficient and reliable reads and writes
– Leaky
– Density problematic
– Fairly fast reads and writes
– Parallelism determined by cost
– Fast decay (short retention)
• High variability
– Rare, but important flips
– Very rare disturbs
– Slow wearout
(c) Mattan Erez
•Example: FLASH
– High-energy writes
– Very dense
– Slow reads
– Very slow writes
– Parallelism determined by technology
– Very slow decay (good retention)
– Flips only on periphery
– Write disturbs problematic
– Fast wearout
(c) Mattan Erez
•Example: PCM
– High-energy writes
– Dense
– Fast reads
– Fast-ish writes
– Parallelism determined by cost
– Slow decay (OK retention)
– Flips only on periphery
– Write disturbs problematic
– Fast wearout
(c) Mattan Erez
•Summary of heterogeneity:
• interrelated tradeoffs
– Efficiency
– Capacity
– Latency
– Bandwidth
– Parallelism (granularity)
– Reliability
(c) Mattan Erez
•The role of heterogeneous processors
• heterogeneous applications
(c) Mattan Erez
•Proportional compute
– Latency oriented
– Throughput oriented
– Specialized
(c) Mattan Erez
•Application characteristics
– Latency sensitive
– Bandwidth sensitive
– Few tasks or many tasks
(c) Mattan Erez
•Application compute styles
– Batch
– Realtime
– Response time
(c) Mattan Erez
•Application data usage
– Capacity
– Locality
– Precision
– Persistence
(c) Mattan Erez
•Memory must be heterogeneousbut illusion of homogeneity is nice
(c) Mattan Erez
•The solution:Adaptive proportional memory systems
(c) Mattan Erez
•The solution:Adaptive proportional memory systems
(c) Mattan Erez
•Adaptive reliability
(c) Mattan Erez
•Adaptive reliability
– Adapt the mechanism
– Adapt the error rate
• precision (stochastic precision)
(c) Mattan Erez
•Adaptive reliability
– By type
• Application guided
– By location (83)
• Machine-state guided
(c) Mattan Erez
•Example: Virtualized ECC
– Adapt the mechanism
(c) Mattan Erez
Physical Memory
Data ECC
Virtual Address space
LowVirtual Page to Physical Frame mapping
Page frame – x
Page frame – y
Page frame – z
Virtual page – x
Virtual page – y
Virtual page – z
Protection Level
Medium
High
(c) Mattan Erez
Physical Memory
Data
Virtual Address space
LowVirtual Page to Physical Frame mapping
Physical Frame to ECC Page mapping
ECC
StrongerECC
Page frame – x
Page frame – y
Page frame – z
ECC page – y
ECC page – z
Virtual page – x
Virtual page – y
Virtual page – z
Protection Level
Medium
High
(c) Mattan Erez
Physical Memory
Data T1EC
Virtual Address space
LowVirtual Page to Physical Frame mapping
Physical Frame to ECC Page mapping
T2EC for Chipkill
T2EC for DoubleChipkill
Page frame – x
Page frame – y
Page frame – z
ECC page – y
ECC page – z
Virtual page – x
Virtual page – y
Virtual page – z
Protection Level
Medium
High
(c) Mattan Erez
Two-Tiered ECC
Write Read
T1ECDecoding
Data T1EC
T1EC/T2ECEncoding
T2EC
(c) Mattan Erez
(c) Mattan Erez
Write Read
T1ECDecoding
Data T1EC
Virtualized
T2EC
T1EC/T2ECEncoding
T2ECMapping
•Caching work great
0.98
0.99
1.00
1.01
1.02
Chipkill Detect ChipkillCorrect
2 ChipkillCorrect
Normalized Execution Time
(c) Mattan Erez
•Bonus: flexibility of ECC
0.75
0.80
0.85
0.90
0.95
1.00
Chipkill Detect Chipkill Correct 2 ChipkillCorrect
Normalized Energy-Delay Product
(c) Mattan Erez
•Example: FREE-p
– Adapt to wearout state
(c) Mattan Erez
•FREE-p insights:
– Wearout is gradual
– Adapt ECC strength to wearout degree
– Limit ECC need with adaptive repair
• Retire and remap
(c) Mattan Erez
•Adapt ECC strength
– Multi-tiered ECC
(c) Mattan Erez
•Different cells need different protection
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0.0 0.9 1.9 2.8 3.7 4.6 5.6 6.5 7.4 8.4 9.3
Years
slow-ECC (4 failures)
slow-ECC (3 failures)
quick-ECC (2 failures)
quick-ECC (1 failure)
quick-ECC (no failure)
(c) Mattan Erez
•Fine-grain retirement is key
0.0
0.2
0.4
0.6
0.8
1.0
1.2
0 2 4 6 8 10 12 14 16
No
rmali
zed
Cap
acit
y
Year
No Correction
SEC64
ECP6
FREE-p
(c) Mattan Erez
•Adaptive FG repair
Page for remappingD/P flag
Data page
ptr
data
(c) Mattan Erez
•Impact of repair and remap is gradual
0
8
16
24
32
40
48
56
64
0 2 4 6 8 10
Fau
lty b
locks /
pag
e
Year
(c) Mattan Erez
•Adaptive granularity (parallelism)
(c) Mattan Erez
•Applications have diverse locality
0%
25%
50%
75%
100%
GU
PS
ca
nn
ea
l
linke
d li
st
mst
em
3d
SSC
A2
ast
ar
om
ne
tpp
bzi
p2
mc
f
lbm
libq
ua
ntu
m
OC
EA
N
stre
am
clu
ste
r
hm
me
r
STR
EA
M
Low spatial locality High spatial locality~50% is referenced
1~2 3~6 7~8
(c) Mattan Erez
•Coarse- or fine-grained memory access?
CG-Only: Wide DRAM channel
FG-Only: Many narrow channels
ABUS
x8 x8 x8
DBUS
ABUS
x8 x8 x8
DBUS
ABUS
x8 x8 x8
DBUS
ABUS
x8 x8 x8
DBUS
ABUS
DBUS
x8 x8 x8 x8 x8 x8 x8 x8 x8
(c) Mattan Erez
•Coarse- or fine-grained memory access?
D0 E0
DataC
C0
C1
D1 E1 D2 E2 D3 E3C2
C3
C4
D4 E4
CG
FGC5
C6
C7
D5 E5 D6 E6 D7 E7
ECC
D0 E0
DataC
C0
CG
FG
ECC
(c) Mattan Erez
Dynamically adapt granularity best of both worlds
ProcessorCore
$I$D
L2
Last Level Cache
MC
DRAM
Core0
Core1
CoreN-1
…
SLP
Co-scheduling CG/FG w/ split CG
Sector cache (8 8B sectors per line)
Sub-Ranked DRAM w/ 2X ABUS Support both CG and FG
Locality Predictor
(c) Mattan Erez
•Subranked memory
– Adapt parallelism
(c) Mattan Erez
DBUS
x8 x8 x8 x8 x8 x8 x8 x8 x8
ABUS
SR0 SR1 SR2 SR3 SR4 SR5 SR6 SR7 SR8
Reg/D
em
ux
•Sectored caches
– Track granularity
(c) Mattan Erez
tag sector 0D V sector 1D V sector 2D V sector 3D V
tag dataD V
Long cache line
Sector cache
tag dataD V
Short cache line
•Granularity information?
– Application provided (when allocated)
– System learned (when accessed)
(c) Mattan Erez
0.0
1.0
2.0
3.0
4.0
5.0
6.0
7.0
8.0
mcf x8 omnetpp
x8
mst x8 em3d x8 canneal
x8
SSCA2 x8 linked list
x8
gups x8 MIX-A MIX-B MIX-C MIX-D
Sy
ste
m T
hro
ug
hp
ut
CG+ECC
FG+ECC
AG+ECC
•Dynamically adapt granularity
•PROPORTIONALITY
(c) Mattan Erez
•Memory is heterogeneous
•Applications are heterogeneous
• this is a good thing
(c) Mattan Erez
•Adaptivity is key
(c) Mattan Erez
•Sometimes need application help
– Programming models and analyses crucial
(c) Mattan Erez
•Think abstractions rather than examples
(c) Mattan Erez
•Memory is heterogeneous
•Applications are heterogeneous
• this is a good thing
•Adaptivity is key
•Sometimes need application help
– Programming models and analyses crucial
•Think abstractions rather than examples
(c) Mattan Erez