emerging nvm memory technologies yuan xieweb.engr.oregonstate.edu/~sllu/xie.pdfemerging nvm memory...
TRANSCRIPT
Emerging NVM Memory Technologies
Yuan XieAssociate Professor
The Pennsylvania State UniversityDepartment of Computer Science & Engineering
www.cse.psu.edu/[email protected]
2
Position Statement
Emerging NVM are very attractive Combing the speed of SRAM, the density of DRAM,
and the non-volatility of Flash memory,
Attractive features high density, low leakage, non-volatile
Undesirable features: Write-related: long write-latency, high write-energy,
low endurance (e.g. PCRAM) Cost (Needs large volume production)
Solution: Hybrid cache/mem/storage + 3D?
Enabling unique applications
3
Outline
Introduction Modeling
MRAM/PCRAM modeling Architecture
MRAM stacking HCA: Hybrid Cache Architecture Hybrid storage system
Application Exascale computing
Conclusion
44
Traditional Memory Hierarchies
On-chip memory (SRAM)
Off-chip memory (DRAM)
Secondary Storage (HDD)
1~30 100~300Latency:(Cycles)
>5000000Large Latency Gap
Solid State Disk (Flash Memory)
25000~2000000
4
5
Emerging Memory Techologies
FeRAM (Ferroelectric RAM)
MRAM (Magnetic RAM)
Memristor (Resistive RAM)
PCRAM (Phase-Change RAM)
5
ToshibaFeRAM(2009)
HP LabsMemristor (2009)
SamsungPCRAM (2008)
EverSpinMRAM(2008)
666
77
Traditional Memory Hierarchies
On-chip memory (SRAM)
Off-chip memory (DRAM)
Secondary Storage (HDD)
1~30 100~300Latency:(Cycles)
>5000000
Solid State Disk (Flash Memory)
25000~2000000
7
8
NVRAM Comparison
8
Courtesy: Motoyuki Ooishi
FeRAM, MRAM, or PCRAM, combines the advantages of SRAM, DRAM, and flash.
Good opportunity to rethink the memory hierarchy design.
9
On-chip memory (SRAM)
Off-chip memory (DRAM)
Secondary Storage (HDD)
~10 ~100Latency:(Cycles) >5000000
Solid State Disk(SSD)
25000~2000000
Phase-change RAM (PCRAM)
Traditional Memory Hierarchies
Magnetic RAM (MRAM)Emerging Non-volatile Memory (NVM)
What is the impact of emerging NVM technologies on computer memory hierarchies?
10
PCRAMsim Model
Developed on the basis of CACTI CACTI models SRAM and DRAM caches CACTI does NOT support PCRAM.
10
2D array of memory cells
Precharge & Equalization
Bitline MuxSense Amplifiers
Sense Amplifier MuxOutput/Write Drivers
Wor
dlin
eD
river
sR
ow D
ecod
ers
CACTI-modeled memory subarray
Memory cells
Peripheral circuitry
PCRAMsim made3 modificationson the subarray-level
11
Area (65nm) 3.66mm2 SRAM 3.30mm2 MRAMCapacity 128KB 512KBRead latency 2.25ns 2.32nsWrite latency 2.26ns 11.02nsRead energy 0.90nJ 0.86nJWrite energy 0.80nJ 5.00nJ
Pros: Low leakage power, high density. Cons: Long write latency and large write energy.
11
High Density
Low Leakage
Replace SRAM caches with MRAM ?
SRAM vs. MRAM
Cache configurations Leakage power2MB (16x128KB) SRAM cache 2.09W8MB (16x512KB) MRAM cache 0.26W
Fast ReadSlow Write
Low Read EnergyHigh Write Energy
12
Direct Replacement
Replace SRAM with MRAM of same area. The number of banks are kept the same. The capacity of L2 cache increases by 4X.
12
L2 cache miss rate reduced. How is the performance?
L2 Cache Miss Rate
13
IPC Comparison (Direct Replacement)
13
The last four benchmarks have high write intensities.(see Observation 1)
IPC (SRAM vs. MRAM)
14
Observation 1 Replacing SRAM L2 caches directly with MRAM
can reduce the access miss rate of L2 caches.
However, the long access latency to MRAM cache has a negative impact on the performance.
When the write intensity is high, it even results in performance degradation.
14
Direct MRAM replacement may harm performance,How is power consumption?
15
Power Analysis (Direct Replacement)
15
For some workloads, MRAM dynamic power dominates!(see Observation 2)
Total Power (SRAM vs. MRAM)
(Normalized to 2M-SRAM-SNUCA)
MRAM leakage power
Total Power (SRAM vs. MRAM)
MRAM dynamic power
16
Observation 2
Replacing SRAM L2 caches directly with MRAM can greatly reduce the leakage power.
When the write intensity is high, the dynamic power increases significantly because of the high write energy of MRAM cache.
Question: How to improve the performance and further reduce power of MRAM?
16
17
SRAM-MRAM Hybrid L2 Cache
20
Using hybrid L2 cache,MRAM write intensities are reduced
(Write Intensity: Pure vs. Hybrid)
Write Intensity (Pure vs. Hybrid)
18
IPC Result
21
the performance degradation is eliminated. The average IPC is increased by 15%.
with read-preemptivedirect replacement
IPC Comparison
19
Power Result
22
with read-preemptive
Total Power Comparison
the dynamic power is reduced.The average total power is further reduced by 17%.
8M-MRAM-DNUCAdirect replacement
20
Comparisons
1016
Yes
No
FastMedium
Medium
High (4)
eDRAMSRAM MRAM PRAM
Density (ratio) Low (1) High (4) High(16)
Dynamic Power Low Low for read; High for write
Medium for read; High for
write
Leakage Power High Low LowSpeed Very
FastFast for read; Slow for write
Slow for read;Very slow for
write
Non-volatility No Yes Yes
Scalability Yes Yes Yes
Endurance 1016 >1015 108
Reduce Cache miss rateIncrease hit latency
Low leakage powerHigh dynamic power
21
No such “Ideal” (On-size-fits-all) Memory
0.20.6
11.4
astar
bzip2 gc
cgo
bmk
h264
hmmer-
splib
quan
tum mcfom
netpp pe
rlsje
ngbla
st bt cgclu
stalw
hmmer lu mg sp ua
spec
jbbde
dup
fluida
nimate
freqm
ine
strea
mcluste
rGeo
meanN
orm
aliz
ed IP
C
1M-SRAM 4M-DRAM 4M-MRAM 16M-PRAM
00.20.40.60.8
1
astar
bzip2 gc
cgo
bmk
h264
hmmer-
splib
quan
tum mcfom
netpp pe
rlsje
ngbla
st bt cgclu
stalw
hmmer lu mg sp ua
spec
jbbde
dup
fluida
nimate
freqm
ine
strea
mcluste
rGeo
mean
Nor
mal
ized
Pow
er
Static Dynamic
Hybrid Cache may outperform
its counterpart of single technology
1.88 1.89
No single memory technology has
the best power-performance
22
HCA: Hybrid Cache ArchitectureCore w/ L1s
L2(SRAM)
L3(eDRAM/MRAM/PRAM)
A cache design scenario with 3D chip integration
Flattening L3 and L4 with hybrid cache
Flattening L2, L3 and L4 with hybrid cache
Core w/ L1s
L2(SRAM)
L3(eDRAM/MRAM)
L4(PRAM)
Core w/ L1s
L2 Fast(SRAM)
L2 Slow(eDRAM/MRAM)
L3(PRAM)
Core w/ L1s
L2 Fast(SRAM)
L2Middle
(eDRAM/MRAM)
L2Slow
(PRAM)
3D Layer 1
3D Layer 2
Core w/ L1s
L2 Fast(SRAM)
L2 Slow(eDRAM/MRAM/PRAM)
2D design scenario
Core
w/ L1sL2L3
Core
w/ L1sL2L3
Core
w/ L1sL2L3
Core
w/ L1sL2L3
Cor
e w
/ L1s
L2 L3
Cor
e w
/ L1s
L2 L3
Cor
e w
/ L1s
L2 L3
Cor
e w
/ L1s
L2 L3
Core w/ L1s
L2(SRAM)
L3(eDRAM/MRAM/PRAM)
3D design scenario
Flattening L2 and L3 with hybrid cache
Baseline: a 2D 8-core CMP (3-level SRAM Caches)
A A B
C D E
LHCA LHCA RHCA
3DHCA3DHCA3DHCA
23
Hybrid Storage (HPCA 2010)
23
… …Data Region
DataBuffer
inMemory
Hybrid ArchitecturePhysical View Structural View
… …Log Region
NANDflash
PRAM
Erase Unit
How to manage the Log-region efficiently?
In-place updating
Sector (512Bytes)
24
Outline
Introduction Modeling
MRAM/PCRAM modeling Architecture
MRAM stacking HCA: Hybrid Cache Architecture Hybrid storage system
Application Exascale computing
Conclusion
25
Fault Resiliency for Exascale System
Microprocessor becomes unreliable Process scaling, voltage scaling, soft error,
NBTI, …… Even assuming socket MTTF remains constant
system MTTF = socket MTTF / number of socket
25
1 socketSocket MTTF = 5 years
Exascale ~100,000 socketSystem MTTF = 26 minutes
26
Checkpoint / Restart Checkpoint / Restart is the state-of-the-art
Hard disk drive (HDD) as the checkpoint storageHDD peak bandwidth: ~100MB/s BlueGene/L: 12 mins to take a checkpoint
Equivalent to 8% performance loss
Scale to exascale ...
26
Tolerable
Unacceptable!
27
PCRAM – A Good Candidate
27Courtesy: Motoyuki Ooishi
HDD NAND Flash
PCRAM
Cell size - 4-6F2 4-6F2
Read time ~4ms 5us-50us 10ns-100ns
Write time ~4ms 2ms-3ms 100-1000ns
Stanbypower
~1W ~0W ~0W
Endurance 1015 105 108
PCRAM is 2 orders fasterthan flash
PCRAM has 3 orders higher endurance than flash
Good candidate for local checkpoint
28
How to Integrate PCRAM 3D PCRAM
Deploy PCRAM directly on top of DRAM Possible local bandwidth ~2.5TB/s
(DIMM bandwidth ~10GB/s)
DRAM
PCRAM
Parameters ValuesBank size 32MBMat count 16Required TSV pitch < 74umITRS TSV pitch projection for 2012
3.8um
3D-PCRAM delay 0.8msEquivalent bandwidth
2500GB/s
Collaboration with HP Labs, Exascale Computing Lab, Dr. Norm Jouppi, SC 2009)
29
Our Projection
29
2008 20172011 20142009 2010 2012 2013 2015 2016
Collaboration with HP Labs, Exascale Computing Lab, Dr. Norm Jouppi, SC 2009)
30
More Details Xiangyu Dong, X. Wu, Guangyu Sun, Yuan Xie, H. Li, Y.Chen, Circuit
and Microarchitecture Evaluation of 3D MRAM, DAC 2008 Xiangyu Dong, Norm Jouppi, Yuan Xie, PCRAMsim: System-Level
Performance, Energy, and Area Modeling for Phase-Change RAMICCAD 2009.
G.Sun, X. Dong, Y. Xie, J. Li, Y. Chen, Novel MRAM-Stacking Architecture for CMP, HPCA 2009
Xiaoxia Wu, J. Li, L. Zhang, E. Speight, Yuan Xie. Hybrid Cache Architecture with Disparate Memory Technologies." ISCA 2009
Guangyu Sun, Y. Joo, Y. Chen, Yuan Xie, Y. Chen, H. Li, A Hybrid Solid-State Storage Architecture for Performance, Energy Consumption and Lifetime Improvement. HPCA 2010.
Y.Joo, D.Niu, Guangyu Sun, Xiangyu Dong, Y. Xie, Energy- and Endurance-Aware Design of PCRAM Caches." DATE. 2010.
Xiangyu Dong, N. Muralimanohar, Norm Jouppi, Richard Kaufmann, Yuan Xie, Leveraging 3D PCRAM Technologies to Reduce Checkpoint Overhead for Future Exascale Systems SC 2009.
http://www.cse.psu.edu/~yuanxie/3d.html
31
Conclusion
Emerging NVM are very attractive Combing the speed of SRAM, the density of DRAM,
and the non-volatility of Flash memory,
Attractive features high density, low leakage, non-volatile
Undesirable features: Write-related: long write-latency, high write-energy,
low endurance (e.g. PCRAM) Cost (Needs large volume production)
Solution: Hybrid cache/mem/storage + 3D?
Enabling unique applications