software-transparent crash consistency for persistent memorysafari/talks/... · nvm as main memory...
TRANSCRIPT
ThyNVM Software-Transparent Crash Consistency
for Persistent Memory
Onur Mutlu [email protected]
(joint work with Jinglei Ren, Jishen Zhao, Samira Khan, Jongmoo Choi, Yongwei Wu)
August 8, 2016 Flash Memory Summit 2016, Santa Clara, CA
Original Paper (I)
2
Original Paper (II) n Presented at ACM/IEEE MICRO Conference in Dec 2015.
n Full paper for details: q Jinglei Ren, Jishen Zhao, Samira Khan, Jongmoo Choi, Yongwei
Wu, and Onur Mutlu, "ThyNVM: Enabling Software-Transparent Crash Consistency in Persistent Memory Systems" Proceedings of the 48th International Symposium on Microarchitecture (MICRO), Waikiki, Hawaii, USA, December 2015. [Slides (pptx) (pdf)] [Lightning Session Slides (pptx) (pdf)] [Poster (pptx) (pdf)] [Source Code]
q https://users.ece.cmu.edu/~omutlu/pub/ThyNVM-transparent-crash-consistency-for-persistent-memory_micro15.pdf
3
The Main Memory System
n Main memory is a critical component of all computing systems: server, mobile, embedded, desktop, sensor
n Main memory system must scale (in size, technology, efficiency, cost, and management algorithms) to maintain performance growth and technology scaling benefits
4
Processor and caches
Main Memory Storage (SSD/HDD)
Limits of Charge Memory n Difficult charge placement and control
q Flash: floating gate charge q DRAM: capacitor charge, transistor leakage
n Reliable sensing becomes difficult as charge storage unit size reduces
5
Emerging NVM Technologies n Some emerging resistive memory technologies seem more
scalable than DRAM (and they are non-volatile)
n Example: Phase Change Memory q Data stored by changing phase of material q Data read by detecting material’s resistance q Expected to scale to 9nm (2022 [ITRS]) q Prototyped at 20nm (Raoux+, IBM JRD 2008) q Expected to be denser than DRAM: can store multiple bits/cell
n But, emerging technologies have (many) shortcomings q Can they be enabled to replace/augment/surpass DRAM?
6
Promising NVM Technologies n PCM
q Inject current to change material phase q Resistance determined by phase
n STT-MRAM q Inject current to change magnet polarity q Resistance determined by polarity
n Memristors/RRAM/ReRAM q Inject current to change atomic structure q Resistance determined by atom distance
7
NVM as Main Memory Replacement n Very promising
q persistence, high capacity, OK latency, low idle power
n Can enable merging of memory and storage
n Two example works that show benefits q Lee, Ipek, Mutlu, Burger, “Architecting Phase Change
Memory as a Scalable DRAM Alternative,” ISCA 2009. q Kultursay, Kandemir, Sivasubramaniam, Mutlu,
“Evaluating STT-RAM as an Energy-Efficient Main Memory Alternative,” ISPASS 2013.
8
Two Example Works
9
Architected STT-MRAM as Main Memory n 4-core, 4GB main memory, multiprogrammed workloads n ~6% performance loss, ~60% energy savings vs. DRAM
10
88%
90%
92%
94%
96%
98%
Per
form
ance
vs
. D
RA
M
STT-RAM (base) STT-RAM (opt)
0%
20%
40%
60%
80%
100%
En
erg
y
vs.
DR
AM
ACT+PRE WB RB
Kultursay+, “Evaluating STT-RAM as an Energy-Efficient Main Memory Alternative,” ISPASS 2013.
A More Viable Approach: Hybrid Memory Systems
Meza+, “Enabling Efficient and Scalable Hybrid Memories,” IEEE Comp. Arch. Letters, 2012. Yoon+, “Row Buffer Locality Aware Caching Policies for Hybrid Memories,” ICCD 2012 Best Paper Award.
CPU DRAMCtrl
Fast, durable Small,
leaky, volatile, high-cost
Large, non-volatile, low-cost Slow, wears out, high active energy
NVMCtrl DRAM NVM Type X
Hardware/software manage data allocation and movement to achieve the best of multiple technologies
Some Opportunities with Emerging Technologies
n Merging of memory and storage q e.g., a single interface to manage all data
n New applications q e.g., ultra-fast checkpoint and restore
n More robust system design q e.g., reducing data loss
n Processing tightly-coupled with memory q e.g., enabling efficient search and filtering
12 Meza+, “A Case for Efficient Hardware-Software Cooperative Management of Storage and Memory,” WEED 2013.
TWO-‐LEVEL STORAGE MODEL CP
U
MEM
ORY
STORA
GE
VOLATILE FAST
BYTE ADDR NONVOLATILE
SLOW BLOCK ADDR
Ld/St
FILE I/O
DRAM
13
TWO-‐LEVEL STORAGE MODEL CP
U
MEM
ORY
STORA
GE
VOLATILE FAST
BYTE ADDR NONVOLATILE
SLOW BLOCK ADDR
Ld/St
FILE I/O
DRAM
14
PCM, STT-RAM NVM
Non-‐vola@le memories combine characteris@cs of memory and storage
Coordinated Memory and Storage with NVM (I) n The traditional two-level storage model is a bottleneck with NVM
q Volatile data in memory à a load/store interface q Persistent data in storage à a file system interface q Problem: Operating system (OS) and file system (FS) code to locate, translate,
buffer data become performance and energy bottlenecks with fast NVM stores
15
Two-Level Store
Processor and caches
Main Memory Storage (SSD/HDD)
Virtual memory
Address translation
Load/Store
Operating system
and file system
fopen, fread, fwrite, …
Persistent (e.g., Phase-Change) Memory
Coordinated Memory and Storage with NVM (II)
n Goal: Unify memory and storage management in a single unit to eliminate wasted work to locate, transfer, and translate data q Improves both energy and performance q Simplifies programming model as well
16
Unified Memory/Storage
Processor and caches
Persistent (e.g., Phase-Change) Memory
Load/Store
Persistent Memory Manager
Feedback
Meza+, “A Case for Efficient Hardware-Software Cooperative Management of Storage and Memory,” WEED 2013.
PERSISTENT MEMORY
CPU
PERSISTENT
MEM
ORY
Provides an opportunity to manipulate persistent data directly
Ld/St
NVM
17
Performance Benefits of Persistent Memory
18
0
0.2
0.4
0.6
0.8
1.0
HDD 2-level NVM 2-level Persistent Memory
Nor
mal
ized
Exe
cutio
n Ti
me
User CPU User Memory Syscall CPU Syscall I/O
0.0440.009
~5X
~24X
Meza+, “A Case for Efficient Hardware-Software Cooperative Management of Storage and Memory,” WEED 2013.
Energy Benefits of Persistent Memory
19
0
0.2
0.4
0.6
0.8
1.0
HDD 2-level NVM 2-level Persistent Memory
Frac
tion
of T
otal
Ener
gy
User CPU Syscall CPU DRAM NVM HDD
0.0650.013
~5X
~16X
Meza+, “A Case for Efficient Hardware-Software Cooperative Management of Storage and Memory,” WEED 2013.
On Persistent Memory Benefits & Challenges
n Justin Meza, Yixin Luo, Samira Khan, Jishen Zhao, Yuan Xie, and Onur Mutlu, "A Case for Efficient Hardware-Software Cooperative Management of Storage and Memory" Proceedings of the 5th Workshop on Energy-Efficient Design (WEED), Tel-Aviv, Israel, June 2013. Slides (pptx) Slides (pdf)
20
The Persistent Memory Manager (PMM) n Exposes a load/store interface to access persistent data
q Applications can directly access persistent memory à no conversion, translation, location overhead for persistent data
n Manages data placement, location, persistence, security q To get the best of multiple forms of storage
n Manages metadata storage and retrieval q This can lead to overheads that need to be managed
n Exposes hooks and interfaces for system software q To enable better data placement and management decisions
n Meza+, “A Case for Efficient Hardware-Software Cooperative Management of Storage and Memory,” WEED 2013.
21
The Persistent Memory Manager (PMM)
22
2.2.1 Efficient Hardware and Software Support: We propose to investigate the efficient software andhardware support needed for single-level stores. A single-level store system should provide an abstractionthat maps persistent user data to physical addresses in memory. A software interface for programs wouldmap a pointer to the actual persistent data. Programs would be able to access any part of the data usingnormal load and store instructions. Figure 2 shows two examples of high-level abstractions which couldbe provided to programs to access persistent data in a single-level store system. In it, a program createsa persistent file (Figure 2 left) or object (Figure 2 right) using the handle “file.dat” and allocates an arrayof integers in it. Later—perhaps after the application or system is restarted—when the program executesthe updateValue() function, the system retrieves the persistent data for the same handle, and the programmodifies its state. With such an abstraction, a single-level store can eliminate the OS system calls to transferdata to and from disk. In addition, it eliminates the need for a file system to track physical file addressesby traversing metadata (such as inodes) in the OS. In this way, single-level stores provide the opportunity todesign a simple and efficient persistent data lookup system in hardware. We plan to research efficient waysto map files or objects to the virtual address space. In such a hardware-based design, the processor wouldmanage how data handles correspond to physical addresses. Note that, single-level stores can use alternativedesign choices, such as segments, to provide the high-level abstraction instead of files or objects. Regardless,segments, files, or objects will be mapped to physical addresses with hardware support. Prior works tried tomake file lookup and update efficient in software [27, 28] in the presence of persistent memory, and otherworks proposed using complex and potentially inefficient hardware directory techniques (e.g., [15]). Ourgoal is to design fast and efficient techniques that take into account the byte addressability of persistentmemory in a single-level store. To this end, we will research the following:• The efficient use of hash table and B-tree indices for locating files in a single-level store.• How techniques such as key-value stores can provide fast and efficient lookups in single-level stores.• Policies for intelligently caching some entries of these indices in hardware to improve system perfor-
mance.Every access in the single-level store needs to be translated from a virtual address used by a program to
a physical address used to access a device. We will investigate how to efficiently manage address translationso that locating data is simple and fast. We intend to explore the following directions to solve this problem:• We will design mechanisms to predict access patterns based on program behavior and pre-compute
virtual-to-physical address translations. We are interested in answering questions such as: What is thepattern of data accesses to a single-level store, and how can prefetching techniques be redesigned withsingle-level stores in mind to enable efficient address translation? How can simple application-level orprofile-based hints on access patterns be communicated to and used by hardware to make address translationand prefetching efficient?• We will design efficient translation lookaside buffer (TLB)-like structures which will cache the trans-
lation between virtual and physical addresses but for a much larger amount of physical memory than inexisting systems. In the presence of such a single-level store, many disparate data accesses could need alarge translation table to be serviced effectively. To reduce overhead of such hardware, we are interested inexploring whether TLB-like structures should favor storing translations only for particular classes of data,such as data with locality or data which is on the critical path of execution, which get the most benefitout of the limited structure space. In addition, we will investigate centralized versus distributed translationstructures to design techniques that, for example, buffer private data translation entries near the processor,while distributing shared entries across processors to minimize translation overheads. Such translation struc-
1 int main(void) {2 // data in file.dat is persistent3 FILE myData = "file.dat";4 myData = new int[64];5 }6 void updateValue(int n, int value) {7 FILE myData = "file.dat";8 myData[n] = value; // value is persistent9 }
1 int main(void) {2 // data in file.dat is persistent3 int *myData = new PersistentObject("file.dat");4 myData = new int[64];5 }6 void updateValue(int n, int value) {7 int *myData = PersistentObject.open("file.dat");8 myData[n] = value; // value is persistent9 }
Figure 2: Sample program with access to file-based (left) and object-based (right) persistent data.
5
Load Store
DRAM Flash NVM HDD
Persistent Memory ManagerHardware
SoftwareData Layout, Persistence, Metadata, Security, ...
Hints from SW/OS/runtime
PMM uses access and hint informa@on to allocate, locate, migrate and access data in the heterogeneous array of devices
Persistent objects
One Key Challenge
n How to ensure consistency of system/data if all memory is persistent?
n Two extremes q Programmer transparent: Let the system handle it q Programmer only: Let the programmer handle it
n Many alternatives in-between…
23
CHALLENGE: CRASH CONSISTENCY
System crash can result in permanent data corrup@on in NVM
24
Persistent Memory System
CURRENT SOLUTIONS Explicit interfaces to manage consistency
– NV-‐Heaps [ASPLOS’11], BPFS [SOSP’09], Mnemosyne [ASPLOS’11]
AtomicBegin { Insert a new node; } AtomicEnd;
Limits adop@on of NVM Have to rewrite code with clear par@@on between vola@le and non-‐vola@le data
Burden on the programmers 25
OUR APPROACH: ThyNVM
26
Goal: Software-transparent crash consistency
in persistent memory systems
ThyNVM: Summary
27
• Checkpoints at mul$ple granulari$es to reduce both checkpoin7ng latency and metadata overhead
• Overlaps checkpoin$ng and execu$on to reduce checkpoin7ng latency
• Adapts to DRAM and NVM characteris7cs
Performs within 4.9% of an idealized DRAM with zero cost consistency
A new hardware-based checkpointing mechanism
OUTLINE
28
Crash Consistency Problem
Current Solutions
ThyNVM
Evaluation
Conclusion
CRASH CONSISTENCY PROBLEM
29
Add a node to a linked list
1. Link to next 2. Link to prev
System crash can result in inconsistent memory state
OUTLINE
30
Crash Consistency Problem
Current Solutions
ThyNVM
Evaluation
Conclusion
CURRENT SOLUTIONS
31
void hashtable_update(hashtable_t* ht, void *key, void *data) { list_t* chain = get_chain(ht, key); pair_t* pair; pair_t updatePair; updatePair.first = key; pair = (pair_t*) list_find(chain, &updatePair); pair->second = data; }
Example Code update a node in a persistent hash table
Explicit interfaces to manage consistency – NV-‐Heaps [ASPLOS’11], BPFS [SOSP’09], Mnemosyne [ASPLOS’11]
CURRENT SOLUTIONS
32
void TMhashtable_update(TMARCGDECL hashtable_t* ht, void *key, void*data){ list_t* chain = get_chain(ht, key); pair_t* pair; pair_t updatePair; updatePair.first = key; pair = (pair_t*) TMLIST_FIND(chain, &updatePair); pair->second = data; }
CURRENT SOLUTIONS
33
void TMhashtable_update(TMARCGDECL hashtable_t* ht, void *key, void*data){ list_t* chain = get_chain(ht, key); pair_t* pair; pair_t updatePair; updatePair.first = key; pair = (pair_t*) TMLIST_FIND(chain, &updatePair); pair->second = data; }
Manual declara@on of persistent components
CURRENT SOLUTIONS
34
void TMhashtable_update(TMARCGDECL hashtable_t* ht, void *key, void*data){ list_t* chain = get_chain(ht, key); pair_t* pair; pair_t updatePair; updatePair.first = key; pair = (pair_t*) TMLIST_FIND(chain, &updatePair); pair->second = data; }
Manual declara@on of persistent components
Need a new implementa@on
CURRENT SOLUTIONS
35
void TMhashtable_update(TMARCGDECL hashtable_t* ht, void *key, void*data){ list_t* chain = get_chain(ht, key); pair_t* pair; pair_t updatePair; updatePair.first = key; pair = (pair_t*) TMLIST_FIND(chain, &updatePair); pair->second = data; }
Manual declara@on of persistent components
Need a new implementa@on
Third party code can be inconsistent
CURRENT SOLUTIONS
36
void TMhashtable_update(TMARCGDECL hashtable_t* ht, void *key, void*data){ list_t* chain = get_chain(ht, key); pair_t* pair; pair_t updatePair; updatePair.first = key; pair = (pair_t*) TMLIST_FIND(chain, &updatePair); pair->second = data; }
Manual declara@on of persistent components
Need a new implementa@on
Third party code can be inconsistent
Prohibited Opera@on
Burden on the programmers
OUTLINE
37
Crash Consistency Problem
Current Solutions
ThyNVM
Evaluation
Conclusion
OUR GOAL
38
Software transparent consistency in persistent memory systems
• Execute legacy applica$ons • Reduce burden on programmers
• Enable easier integra$on of NVM
NO MODIFICATION IN THE CODE
void hashtable_update(hashtable_t* ht, void *key, void *data) { list_t* chain = get_chain(ht, key); pair_t* pair; pair_t updatePair; updatePair.first = key; pair = (pair_t*) list_find(chain, &updatePair); pair->second = data; }
RUN THE EXACT SAME CODE…
Persistent Memory System
So`ware transparent memory crash consistency
40
void hashtable_update(hashtable_t* ht, void *key, void *data){ list_t* chain = get_chain(ht, key); pair_t* pair; pair_t updatePair; updatePair.first = key; pair = (pair_t*) list_find(chain,
&updatePair); pair->second = data; }
ThyNVM APPROACH
Running
Epoch 0 Epoch 1
time
Checkpoin@ng Running Checkpoin@ng
41
Periodic checkpointing of data managed by hardware
Transparent to applica@on and system
CHECKPOINTING OVERHEAD
Running
Epoch 0 Epoch 1
time
Checkpoin@ng Running Checkpoin@ng
42
1. Metadata overhead
Working loca@on Checkpoint loca@on X X’ Y Y’
Metadata Table
2. Checkpointing latency
1. METADATA AND CHECKPOINTING GRANULARITY
PAGE CACHE BLOCK
One Entry Per Page Small Metadata
One Entry Per Block Huge Metadata
43
PAGE GRANULARITY
BLOCK GRANULARITY
Working loca@on Checkpoint loca@on X X’ Y Y’
W
2. LATENCY AND LOCATION
44
DRAM-‐BASED WRITEBACK
Long latency of wri@ng back data to NVM
DRAM NVM
Working loca@on Checkpoint loca@on X X’ 1. Writeback data
from DRAM
2. Update the metadata table
2. LATENCY AND LOCATION
45
NVM-‐BASED REMAPPING
Short latency in NVM-‐based remapping
DRAM NVM
Working loca@on Checkpoint loca@on Y X
1. No copying of data
2. Update the metadata table
3. Write in a new loca7on
ThyNVM KEY MECHANISMS Checkpoin@ng granularity
• Small granularity: large metadata • Large granularity: small metadata
Latency and loca@on • Writeback from DRAM: long latency • Remap in NVM: short latency
Based on these, we propose two key mechanisms 1. Dual granularity checkpoin@ng 2. Overlap of execu@on and checkpoin@ng
DRAM NVM
1. DUAL GRANULARITY CHECKPOINTING
High write locality pages in DRAM, low write locality pages in NVM
Page Writeback in DRAM
Block Remapping in NVM
47
GOOD FOR STREAMING WRITES
GOOD FOR RANDOM WRITES
TRADEOFF SPACE
Running
time
Checkpoin@ng Running Checkpoin@ng
time
Epoch 0
Epoch 1 Epoch 2
Epoch 0 Epoch 1 Running Checkpoin@ng Running Checkpoin@ng
Running Checkpoin@ng
Epoch 0 Epoch 1
2. OVERLAPPING CHECKPOINTING AND EXECUTION
Hides the long latency of Page Writeback
OUTLINE
50
Crash Consistency Problem
Current Solutions
ThyNVM
Evaluation
Conclusion
SYSTEM ARCHITECTURE
MEMORY ADDRESS SPACE
METHODOLOGY
53
Cycle accurate x86 simulator Gem5 Comparison Points: Ideal DRAM: DRAM-‐based, no cost for consistency
– Lowest latency system
Ideal NVM: NVM-‐based, no cost for consistency – NVM has higher latency than DRAM
Journaling: Hybrid, commit dirty cache blocks – Leverages DRAM to buffer dirty blocks
Shadow Paging: Hybrid, copy-‐on-‐write pages – Leverages DRAM to buffer dirty pages
ADAPTIVITY TO ACCESS PATTERN
54
0
1
2
3
Journal Shadow ThyNVM
Normalized
Write
Traffic To
NVM
0
1
2
3
Journal Shadow ThyNVM
Normalized
Write
Traffic To
NVM
RANDOM SEQUENTIAL
ThyNVM adapts to both access paherns
Journaling is beher for Random and Shadow paging is beher for Sequen@al
BETTER
OVERLAPPING CHECKPOINTING AND EXECUTION
55
RANDOM SEQUENTIAL
0
20
40
60
Journal Shadow ThyNVM
Percen
tage of
Execu@
on Tim
e
0
20
40
60
Journal Shadow ThyNVM
Percen
tage of
Execu@
on Tim
e
Stalls the applica@on for a negligible @me
ThyNVM spends only 2.4%/5.5% of the execu@on @me on checkpoin@ng
Can spend 35-‐45% of the execu@on on checkpoin@ng
BETTER
PERFORMANCE OF LEGACY CODE
56
0
0.2
0.4
0.6
0.8
1
gcc bwaves milc leslie. soplex Gems lbm omnet
Normalized
IPC
Ideal DRAM Ideal NVM ThyNVM
Provides consistency without significant performance overhead
Within -‐4.9%/+2.7% of an idealized DRAM/NVM system
KEY-‐VALUE STORE TX THROUGHPUT
Storage throughput close to Ideal DRAM
OUTLINE
58
Crash Consistency Problem
Current Solutions
ThyNVM
Evaluation
Conclusion
ThyNVM
59
• Checkpoints at mul$ple granulari$es to minimize both latency and metadata
• Overlaps checkpoin$ng and execu$on
• Adapts to DRAM and NVM characteris7cs
Can enable widespread adop$on of persistent memory
A new hardware-based checkpointing mechanism,
with no programming effort
ThyNVM Enabling So`ware-‐transparent
Crash Consistency In Persistent Memory Systems
Source Code and More Available at hhp://persper.com/thynvm
Our Other FMS 2016 Talks n "A Large-Scale Study of Flash Memory Errors in the Field”
q Onur Mutlu (ETH Zurich & CMU) August 10 @ 3:50pm q Study of flash-based SSD errors in Facebook data centers
over the course of 4 years q First large-scale field study of flash memory reliability q Forum F-22: SSD Testing (Testing Track)
n Practical Threshold Voltage Distribution Modeling q Yixin Luo (CMU PhD Student) August 10 @ 4:20pm q Forum E-22: Controllers and Flash Technology
n "WARM: Improving NAND Flash Memory Lifetime with Write-hotness Aware Retention Management” q Saugata Ghose (CMU Researcher) August 10 @ 5:45pm q Forum C-22: SSD Concepts (SSDs Track)
61
Referenced Papers and Talks
n All are available at http://users.ece.cmu.edu/~omutlu/projects.htm http://users.ece.cmu.edu/~omutlu/talks.htm
n And, many other previous works on q NVM & Persistent Memory q DRAM q Hybrid memories q NAND flash memory
62
Thank you.
Feel free to email me with any questions & feedback
[email protected] http://users.ece.cmu.edu/~omutlu/
ThyNVM Software-Transparent Crash Consistency
for Persistent Memory
Onur Mutlu [email protected]
(joint work with Jinglei Ren, Jishen Zhao, Samira Khan, Jongmoo Choi, Yongwei Wu)
August 8, 2016 Flash Memory Summit 2016, Santa Clara, CA
References to Papers and Talks
Challenges and Opportunities in Memory
n Onur Mutlu, "Rethinking Memory System Design" Keynote talk at 2016 ACM SIGPLAN International Symposium on Memory Management (ISMM), Santa Barbara, CA, USA, June 2016. [Slides (pptx) (pdf)] [Abstract]
n Onur Mutlu and Lavanya Subramanian, "Research Problems and Opportunities in Memory Systems" Invited Article in Supercomputing Frontiers and Innovations (SUPERFRI), 2015.
66
Phase Change Memory As DRAM Replacement
n Benjamin C. Lee, Engin Ipek, Onur Mutlu, and Doug Burger, "Architecting Phase Change Memory as a Scalable DRAM Alternative" Proceedings of the 36th International Symposium on Computer Architecture (ISCA), pages 2-13, Austin, TX, June 2009. Slides (pdf)
n Benjamin C. Lee, Ping Zhou, Jun Yang, Youtao Zhang, Bo Zhao, Engin Ipek, Onur Mutlu, and Doug Burger, "Phase Change Technology and the Future of Main Memory" IEEE Micro, Special Issue: Micro's Top Picks from 2009 Computer Architecture Conferences (MICRO TOP PICKS), Vol. 30, No. 1, pages 60-70, January/February 2010.
67
STT-MRAM As DRAM Replacement
n Emre Kultursay, Mahmut Kandemir, Anand Sivasubramaniam, and Onur Mutlu, "Evaluating STT-RAM as an Energy-Efficient Main Memory Alternative" Proceedings of the 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), Austin, TX, April 2013. Slides (pptx) (pdf)
68
Taking Advantage of Persistence in Memory n Justin Meza, Yixin Luo, Samira Khan, Jishen Zhao, Yuan Xie, and
Onur Mutlu, "A Case for Efficient Hardware-Software Cooperative Management of Storage and Memory" Proceedings of the 5th Workshop on Energy-Efficient Design (WEED), Tel-Aviv, Israel, June 2013. Slides (pptx) Slides (pdf)
n Jinglei Ren, Jishen Zhao, Samira Khan, Jongmoo Choi, Yongwei Wu, and Onur Mutlu, "ThyNVM: Enabling Software-Transparent Crash Consistency in Persistent Memory Systems" Proceedings of the 48th International Symposium on Microarchitecture (MICRO), Waikiki, Hawaii, USA, December 2015. [Slides (pptx) (pdf)] [Lightning Session Slides (pptx) (pdf)] [Poster (pptx) (pdf)] [Source Code]
69
Hybrid DRAM + NVM Systems (I) n HanBin Yoon, Justin Meza, Rachata Ausavarungnirun, Rachael Harding,
and Onur Mutlu, "Row Buffer Locality Aware Caching Policies for Hybrid Memories" Proceedings of the 30th IEEE International Conference on Computer Design (ICCD), Montreal, Quebec, Canada, September 2012. Slides (pptx) (pdf) Best paper award (in Computer Systems and Applications track).
n Justin Meza, Jichuan Chang, HanBin Yoon, Onur Mutlu, and Parthasarathy Ranganathan, "Enabling Efficient and Scalable Hybrid Memories Using Fine-Granularity DRAM Cache Management" IEEE Computer Architecture Letters (CAL), February 2012.
70
Hybrid DRAM + NVM Systems (II)
n Dongwoo Kang, Seungjae Baek, Jongmoo Choi, Donghee Lee, Sam H. Noh, and Onur Mutlu, "Amnesic Cache Management for Non-Volatile Memory" Proceedings of the 31st International Conference on Massive Storage Systems and Technologies (MSST), Santa Clara, CA, June 2015. [Slides (pdf)]
71
NVM Design and Architecture
n HanBin Yoon, Justin Meza, Naveen Muralimanohar, Norman P. Jouppi, and Onur Mutlu, "Efficient Data Mapping and Buffering Techniques for Multi-Level Cell Phase-Change Memories" ACM Transactions on Architecture and Code Optimization (TACO), Vol. 11, No. 4, December 2014. [Slides (ppt) (pdf)] Presented at the 10th HiPEAC Conference, Amsterdam, Netherlands, January 2015. [Slides (ppt) (pdf)]
n Justin Meza, Jing Li, and Onur Mutlu, "Evaluating Row Buffer Locality in Future Non-Volatile Main Memories" SAFARI Technical Report, TR-SAFARI-2012-002, Carnegie Mellon University, December 2012.
72
Our FMS Talks and Posters • Onur Mutlu, ThyNVM: SoCware-‐Transparent Crash Consistency for Persistent Memory, FMS 2016.
• Onur Mutlu, Large-‐Scale Study of In-‐the-‐Field Flash Failures, FMS 2016. • Yixin Luo, Prac$cal Threshold Voltage Distribu$on Modeling, FMS 2016. • Saugata Ghose, Write-‐hotness Aware Reten$on Management, FMS 2016. • Onur Mutlu, Read Disturb Errors in MLC NAND Flash Memory, FMS 2015. • Yixin Luo, Data Reten$on in MLC NAND Flash Memory, FMS 2015. • Onur Mutlu, Error Analysis and Management for MLC NAND Flash Memory, FMS 2014.
• FMS 2016 posters: -‐ WARM: Improving NAND Flash Memory Life7me with Write-‐hotness Aware Reten7on Management
-‐ Read Disturb Errors in MLC NAND Flash Memory -‐ Data Reten7on in MLC NAND Flash Memory
73
Our Flash Memory Works (I) 1. Reten@on noise study and management 1) Yu Cai, Gulay Yalcin, Onur Mutlu, Erich F. Haratsch, Adrian Cristal, Osman
Unsal, and Ken Mai, Flash Correct-‐and-‐Refresh: Reten@on-‐Aware Error Management for Increased Flash Memory Life@me, ICCD 2012.
2) Yu Cai, Yixin Luo, Erich F. Haratsch, Ken Mai, and Onur Mutlu, Data Reten@on in MLC NAND Flash Memory: Characteriza@on, Op@miza@on and Recovery, HPCA 2015.
3) Yixin Luo, Yu Cai, Saugata Ghose, Jongmoo Choi, and Onur Mutlu, WARM: Improving NAND Flash Memory Life@me with Write-‐hotness Aware Reten@on Management, MSST 2015.
2. Flash-‐based SSD prototyping and tes@ng planorm 4) Yu Cai, Erich F. Haratsh, Mark McCartney, Ken Mai,
FPGA-‐based solid-‐state drive prototyping planorm, FCCM 2011.
74
Our Flash Memory Works (II) 3. Overall flash error analysis 5) Yu Cai, Erich F. Haratsch, Onur Mutlu, and Ken Mai,
Error Paherns in MLC NAND Flash Memory: Measurement, Characteriza@on, and Analysis, DATE 2012.
6) Yu Cai, Gulay Yalcin, Onur Mutlu, Erich F. Haratsch, Adrian Cristal, Osman Unsal, and Ken Mai, Error Analysis and Reten@on-‐Aware Error Management for NAND Flash Memory, ITJ 2013.
4. Program and erase noise study 7) Yu Cai, Erich F. Haratsch, Onur Mutlu, and Ken Mai,
Threshold Voltage Distribu@on in MLC NAND Flash Memory: Characteriza@on, Analysis and Modeling, DATE 2013.
75
Our Flash Memory Works (III) 5. Cell-‐to-‐cell interference characteriza@on and tolerance 8) Yu Cai, Onur Mutlu, Erich F. Haratsch, and Ken Mai,
Program Interference in MLC NAND Flash Memory: Characteriza@on, Modeling, and Mi@ga@on, ICCD 2013.
9) Yu Cai, Gulay Yalcin, Onur Mutlu, Erich F. Haratsch, Osman Unsal, Adrian Cristal, and Ken Mai, Neighbor-‐Cell Assisted Error Correc@on for MLC NAND Flash Memories, SIGMETRICS 2014.
6. Read disturb noise study 10) Yu Cai, Yixin Luo, Saugata Ghose, Erich F. Haratsch, Ken Mai, and Onur Mutlu,
Read Disturb Errors in MLC NAND Flash Memory: Characteriza@on and Mi@ga@on, DSN 2015.
76
Our Flash Memory Works (IV) 7. Flash errors in the field 11) Jus7n Meza, Qiang Wu, Sanjeev Kumar, and Onur Mutlu,
A Large-‐Scale Study of Flash Memory Errors in the Field, SIGMETRICS 2015.
8. Persistent memory 12) Jinglei Ren, Jishen Zhao, Samira Khan, Jongmoo Choi, Yongwei Wu, and Onur
Mutlu, ThyNVM: Enabling So`ware-‐Transparent Crash Consistency in Persistent Memory Systems, MICRO 2015.
77
Referenced Papers and Talks
n All are available at http://users.ece.cmu.edu/~omutlu/projects.htm http://users.ece.cmu.edu/~omutlu/talks.htm
n And, many other previous works on NAND flash memory errors and management
78
Related Videos and Course Materials n Undergraduate Computer Architecture Course Lecture
Videos (2013, 2014, 2015)
n Undergraduate Computer Architecture Course Materials (2013, 2014, 2015)
n Graduate Computer Architecture Lecture Videos (2013, 2015)
n Parallel Computer Architecture Course Materials (Lecture Videos)
n Memory Systems Short Course Materials (Lecture Video on Main Memory and DRAM Basics)
79
Additional Slides on Persistent Memory and NVM
Phase Change Memory: Pros and Cons n Pros over DRAM
q Better technology scaling (capacity and cost) q Non volatility q Low idle power (no refresh)
n Cons q Higher latencies: ~4-15x DRAM (especially write) q Higher active energy: ~2-50x DRAM (especially write) q Lower endurance (a cell dies after ~108 writes) q Reliability issues (resistance drift)
n Challenges in enabling PCM as DRAM replacement/helper: q Mitigate PCM shortcomings q Find the right way to place PCM in the system
81
PCM-based Main Memory (I) n How should PCM-based (main) memory be organized?
n Hybrid PCM+DRAM [Qureshi+ ISCA’09, Dhiman+ DAC’09]: q How to partition/migrate data between PCM and DRAM
82
PCM-based Main Memory (II) n How should PCM-based (main) memory be organized?
n Pure PCM main memory [Lee et al., ISCA’09, Top Picks’10]:
q How to redesign entire hierarchy (and cores) to overcome PCM shortcomings
83
An Initial Study: Replace DRAM with PCM n Lee, Ipek, Mutlu, Burger, “Architecting Phase Change
Memory as a Scalable DRAM Alternative,” ISCA 2009. q Surveyed prototypes from 2003-2008 (e.g. IEDM, VLSI, ISSCC) q Derived “average” PCM parameters for F=90nm
84
Results: Naïve Replacement of DRAM with PCM n Replace DRAM with PCM in a 4-core, 4MB L2 system n PCM organized the same as DRAM: row buffers, banks, peripherals n 1.6x delay, 2.2x energy, 500-hour average lifetime
n Lee, Ipek, Mutlu, Burger, “Architecting Phase Change Memory as a
Scalable DRAM Alternative,” ISCA 2009. 85
Results: Architected PCM as Main Memory n 1.2x delay, 1.0x energy, 5.6-year average lifetime n Scaling improves energy, endurance, density
n Caveat 1: Worst-case lifetime is much shorter (no guarantees) n Caveat 2: Intensive applications see large performance and energy hits n Caveat 3: Optimistic PCM parameters?
86
A More Viable Approach: Hybrid Memory Systems
Meza+, “Enabling Efficient and Scalable Hybrid Memories,” IEEE Comp. Arch. Letters, 2012. Yoon+, “Row Buffer Locality Aware Caching Policies for Hybrid Memories,” ICCD 2012 Best Paper Award.
CPU DRAMCtrl
Fast, durable Small,
leaky, volatile, high-cost
Large, non-volatile, low-cost Slow, wears out, high active energy
PCM Ctrl DRAM Phase Change Memory (or Tech. X)
Hardware/software manage data allocation and movement to achieve the best of multiple technologies
Data Placement Between DRAM and PCM n Idea: Characterize data access patterns and guide data
placement in hybrid memory
n Streaming accesses: As fast in PCM as in DRAM
n Random accesses: Much faster in DRAM
n Idea: Place random access data with some reuse in DRAM; streaming data in PCM
n Yoon+, “Row Buffer Locality-Aware Data Placement in Hybrid Memories,” ICCD 2012 Best Paper Award.
88
0 0.2 0.4 0.6 0.8
1 1.2 1.4 1.6 1.8
2
Weighted Speedup Max. Slowdown Perf. per Watt Normalized Metric
16GB PCM RBLA-Dyn 16GB DRAM
0 0.2 0.4 0.6 0.8
1 1.2 1.4 1.6 1.8
2
Nor
mal
ized
Wei
ghte
d Sp
eedu
p
0
0.2
0.4
0.6
0.8
1
1.2
Nor
mal
ized
Max
. Slo
wdo
wn
Hybrid vs. All-PCM/DRAM [ICCD’12]
31% beher performance than all PCM, within 29% of all DRAM performance
31%
29%
Yoon+, “Row Buffer Locality-‐Aware Data Placement in Hybrid Memories,” ICCD 2012 Best Paper Award.
STT-MRAM as Main Memory n Magnetic Tunnel Junction (MTJ) device
q Reference layer: Fixed magnetic orientation q Free layer: Parallel or anti-parallel
n Magnetic orientation of the free layer determines logical state of device q High vs. low resistance
n Write: Push large current through MTJ to change orientation of free layer
n Read: Sense current flow
n Kultursay et al., “Evaluating STT-RAM as an Energy-Efficient Main Memory Alternative,” ISPASS 2013.
Reference Layer
Free Layer
Barrier
Reference Layer
Free Layer
Barrier
Logical 0
Logical 1
Word Line
Bit Line
Access Transistor
MTJ
Sense Line
STT-MRAM: Pros and Cons n Pros over DRAM
q Better technology scaling q Non volatility q Low idle power (no refresh)
n Cons q Higher write latency q Higher write energy q Reliability?
n Another level of freedom q Can trade off non-volatility for lower write latency/energy (by
reducing the size of the MTJ)
91
Architected STT-MRAM as Main Memory n 4-core, 4GB main memory, multiprogrammed workloads n ~6% performance loss, ~60% energy savings vs. DRAM
92
88%
90%
92%
94%
96%
98%
Per
form
ance
vs
. D
RA
M
STT-RAM (base) STT-RAM (opt)
0%
20%
40%
60%
80%
100%
En
erg
y
vs.
DR
AM
ACT+PRE WB RB
Kultursay+, “Evaluating STT-RAM as an Energy-Efficient Main Memory Alternative,” ISPASS 2013.
Other Opportunities with Emerging Technologies
n Merging of memory and storage q e.g., a single interface to manage all data
n New applications q e.g., ultra-fast checkpoint and restore
n More robust system design q e.g., reducing data loss
n Processing tightly-coupled with memory q e.g., enabling efficient search and filtering
93
Coordinated Memory and Storage with NVM (I) n The traditional two-level storage model is a bottleneck with NVM
q Volatile data in memory à a load/store interface q Persistent data in storage à a file system interface q Problem: Operating system (OS) and file system (FS) code to locate, translate,
buffer data become performance and energy bottlenecks with fast NVM stores
94
Two-Level Store
Processor and caches
Main Memory Storage (SSD/HDD)
Virtual memory
Address translation
Load/Store
Operating system
and file system
fopen, fread, fwrite, …
Persistent (e.g., Phase-Change) Memory
Coordinated Memory and Storage with NVM (II)
n Goal: Unify memory and storage management in a single unit to eliminate wasted work to locate, transfer, and translate data q Improves both energy and performance q Simplifies programming model as well
95
Unified Memory/Storage
Processor and caches
Persistent (e.g., Phase-Change) Memory
Load/Store
Persistent Memory Manager
Feedback
Meza+, “A Case for Efficient Hardware-Software Cooperative Management of Storage and Memory,” WEED 2013.
The Persistent Memory Manager (PMM) n Exposes a load/store interface to access persistent data
q Applications can directly access persistent memory à no conversion, translation, location overhead for persistent data
n Manages data placement, location, persistence, security q To get the best of multiple forms of storage
n Manages metadata storage and retrieval q This can lead to overheads that need to be managed
n Exposes hooks and interfaces for system software q To enable better data placement and management decisions
n Meza+, “A Case for Efficient Hardware-Software Cooperative Management of Storage and Memory,” WEED 2013.
96
The Persistent Memory Manager (PMM)
97
2.2.1 Efficient Hardware and Software Support: We propose to investigate the efficient software andhardware support needed for single-level stores. A single-level store system should provide an abstractionthat maps persistent user data to physical addresses in memory. A software interface for programs wouldmap a pointer to the actual persistent data. Programs would be able to access any part of the data usingnormal load and store instructions. Figure 2 shows two examples of high-level abstractions which couldbe provided to programs to access persistent data in a single-level store system. In it, a program createsa persistent file (Figure 2 left) or object (Figure 2 right) using the handle “file.dat” and allocates an arrayof integers in it. Later—perhaps after the application or system is restarted—when the program executesthe updateValue() function, the system retrieves the persistent data for the same handle, and the programmodifies its state. With such an abstraction, a single-level store can eliminate the OS system calls to transferdata to and from disk. In addition, it eliminates the need for a file system to track physical file addressesby traversing metadata (such as inodes) in the OS. In this way, single-level stores provide the opportunity todesign a simple and efficient persistent data lookup system in hardware. We plan to research efficient waysto map files or objects to the virtual address space. In such a hardware-based design, the processor wouldmanage how data handles correspond to physical addresses. Note that, single-level stores can use alternativedesign choices, such as segments, to provide the high-level abstraction instead of files or objects. Regardless,segments, files, or objects will be mapped to physical addresses with hardware support. Prior works tried tomake file lookup and update efficient in software [27, 28] in the presence of persistent memory, and otherworks proposed using complex and potentially inefficient hardware directory techniques (e.g., [15]). Ourgoal is to design fast and efficient techniques that take into account the byte addressability of persistentmemory in a single-level store. To this end, we will research the following:• The efficient use of hash table and B-tree indices for locating files in a single-level store.• How techniques such as key-value stores can provide fast and efficient lookups in single-level stores.• Policies for intelligently caching some entries of these indices in hardware to improve system perfor-
mance.Every access in the single-level store needs to be translated from a virtual address used by a program to
a physical address used to access a device. We will investigate how to efficiently manage address translationso that locating data is simple and fast. We intend to explore the following directions to solve this problem:• We will design mechanisms to predict access patterns based on program behavior and pre-compute
virtual-to-physical address translations. We are interested in answering questions such as: What is thepattern of data accesses to a single-level store, and how can prefetching techniques be redesigned withsingle-level stores in mind to enable efficient address translation? How can simple application-level orprofile-based hints on access patterns be communicated to and used by hardware to make address translationand prefetching efficient?• We will design efficient translation lookaside buffer (TLB)-like structures which will cache the trans-
lation between virtual and physical addresses but for a much larger amount of physical memory than inexisting systems. In the presence of such a single-level store, many disparate data accesses could need alarge translation table to be serviced effectively. To reduce overhead of such hardware, we are interested inexploring whether TLB-like structures should favor storing translations only for particular classes of data,such as data with locality or data which is on the critical path of execution, which get the most benefitout of the limited structure space. In addition, we will investigate centralized versus distributed translationstructures to design techniques that, for example, buffer private data translation entries near the processor,while distributing shared entries across processors to minimize translation overheads. Such translation struc-
1 int main(void) {2 // data in file.dat is persistent3 FILE myData = "file.dat";4 myData = new int[64];5 }6 void updateValue(int n, int value) {7 FILE myData = "file.dat";8 myData[n] = value; // value is persistent9 }
1 int main(void) {2 // data in file.dat is persistent3 int *myData = new PersistentObject("file.dat");4 myData = new int[64];5 }6 void updateValue(int n, int value) {7 int *myData = PersistentObject.open("file.dat");8 myData[n] = value; // value is persistent9 }
Figure 2: Sample program with access to file-based (left) and object-based (right) persistent data.
5
Load Store
DRAM Flash NVM HDD
Persistent Memory ManagerHardware
SoftwareData Layout, Persistence, Metadata, Security, ...
Hints from SW/OS/runtime
PMM uses access and hint informa@on to allocate, locate, migrate and access data in the heterogeneous array of devices
Persistent objects
Performance Benefits of a Single-Level Store
98
0
0.2
0.4
0.6
0.8
1.0
HDD 2-level NVM 2-level Persistent Memory
Nor
mal
ized
Exe
cutio
n Ti
me
User CPU User Memory Syscall CPU Syscall I/O
0.0440.009
~5X
~24X
Meza+, “A Case for Efficient Hardware-Software Cooperative Management of Storage and Memory,” WEED 2013.
Energy Benefits of a Single-Level Store
99
0
0.2
0.4
0.6
0.8
1.0
HDD 2-level NVM 2-level Persistent Memory
Frac
tion
of T
otal
Ener
gy
User CPU Syscall CPU DRAM NVM HDD
0.0650.013
~5X
~16X
Meza+, “A Case for Efficient Hardware-Software Cooperative Management of Storage and Memory,” WEED 2013.