transactional flash v. prabhakaran, t. l. rodeheffer, l. zhou (msr, silicon valley), osdi 2008
DESCRIPTION
Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008. Shimin Chen Big Data Reading Group. Introduction. SSD: block-level APIs as disks Lost of opportunity - PowerPoint PPT PresentationTRANSCRIPT
Transactional FlashV. Prabhakaran, T. L. Rodeheffer, L.
Zhou (MSR, Silicon Valley), OSDI 2008
Shimin Chen
Big Data Reading Group
Introduction
SSD: block-level APIs as disks Lost of opportunity
Goal: new abstractions for better matching the nature of the new medium as well as the need from file systems and databases
Idea: Transactional Flash (Txflash)
An SSD (w/ new features) Addressing: a linear array of pages Support read and write operations Support a simple transactional construct
Each tranx consists of a series of write operations Atomicity Isolation Durability
Why is this useful?
Transaction abstraction required in many places: file system journals, etc.
Each application implements its own Complexity Redundant work Reliability of the implementation
Great if a storage layer provides transactional API
Previous Work: disk-based
Copy-on-Write + Logging Fragmentation poor read performance
Checkpointing and cleaning Cleaning cost
SSDs mitigate these problems SSDs already do CoW for flash-related reasons Random read accesses are fast
Outline
Introduction The Case for TxFlash Commit Protocols Implementation Evaluation Conclusion
TxFlash Architecture & API
s
WriteAtomic(p1…pn) p1…pn are in a tranx followed by write(p1)…write(pn) atomicity, isolation, durability
Abort aborting in-progress tranx
In-progress tranx
Not issue conflict writes
Core of TxFlash
Simple Interface
WriteAtomic: multi-page writes Useful for file systems
Not full-fledged tranx: no reads in tranx Reduce complexity
Backward compatible
Flash is good for this purpose
Copy-on-write: already supported by FTL Fast random reads High concurrency
multiple flash chips inside New device:
New interface more likely
Outline
Introduction The Case for TxFlash Commit Protocols Implementation Evaluation Conclusion
Traditional Commit
First write to a log: Intention record: (data, page# & version#, tranx ID) … Intention record Commit record
Tranx is committed == commit record exists Intention records modify original data If modifications are done, the records can be
garbage collected
Traditional Commit on SSDs
Optimizations: All writes can be issued in parallel Not update the original data, just update the
remap table Problem: commit record
Extra latency after other writes Garbage collection is complicated:
Must know if all the updates complete or not
New Proposal (1): Simple Cyclic Commit
No commit record Intension records of the same tranx use
next links to form a cycle (data, page# & version#, next page# & version#)
Tranx is committed == all intension records are written
Flash page (4KB) + metadata (128B)are co-located
Problem
Solution:
Any uncommitted intention on the stable storage must be erased before any new writes are issued to the same or a referenced page
Operations
Initialization: Setting version# to 0, next-link to self
Transaction Garbage Collection:
For any uncommitted intention For committed page if a newer version is
committed Recovery: scan all pages then look for cycles
New Proposal (2):Back Pointer Cyclic Commit
Another way to deal with ambiguity Intention record:
(data, page#&version#, next-link, link to last committed version)
A3 is a straddler of A2
Some complexity in garbage collection and recovery because of this
Protocol Comparison
Outline
Introduction The Case for TxFlash Commit Protocols Implementation Evaluation Conclusion
Implementation
Simulatior DiskSim
trace-driven SSD simulator (UNIX’08)modifications for TxFlash
Support tranx of maximum size 4MB Pseudo-device driver for recording traces TxExt3:
Employ Txflash for Ext3 file system Tranx: Ext3 journal commit
Experimental Setup
TxFlash device: 32GB: 8x 4GB flash packages 4 I/O operations within every flash package 15% of space reserved for garbage collection
Workload on top of Ext3: IOzone: micro benchmark (no sync writes) Linux-build (no sync writes) Maildir (sync writes) TPC-B: simulate 10,000 credit-debit-like operations on
TxExt3 file system (sync writes)
Synthetic workloads
Cyclic commit vs. Traditional commit
Unlike database logging, large tranx sizes: no sync; data are included
• simple cyclic commit has a high cost if there are aborts
TxFlash vs. SSD
Remove WriteAtomic from traces Use SSD simulator SSD does not provide any transaction
guarantees (so should have better performance)
Space comparison: TxFlash needs 25% of more main memory than SSD
• 4+1 MB per 4GB flash 40 MB for the 32GB TxFlash device
End-to-end performance
TxFlash: Run pseudo-device driver on real SSD The performance is close to that of TxFlash
Ext3: Use SSD as journal
SSD cache is disabled in both cases
Summary
TxFlash: Adding transaction interface in SSD Cyclic commit protocols
Nice solution for file system journaling