transactional flash v. prabhakaran, t. l. rodeheffer, l. zhou (msr, silicon valley), osdi 2008

Post on 31-Dec-2015

45 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008. Shimin Chen Big Data Reading Group. Introduction. SSD: block-level APIs as disks Lost of opportunity - PowerPoint PPT Presentation

TRANSCRIPT

Transactional FlashV. Prabhakaran, T. L. Rodeheffer, L.

Zhou (MSR, Silicon Valley), OSDI 2008

Shimin Chen

Big Data Reading Group

Introduction

SSD: block-level APIs as disks Lost of opportunity

Goal: new abstractions for better matching the nature of the new medium as well as the need from file systems and databases

Idea: Transactional Flash (Txflash)

An SSD (w/ new features) Addressing: a linear array of pages Support read and write operations Support a simple transactional construct

Each tranx consists of a series of write operations Atomicity Isolation Durability

Why is this useful?

Transaction abstraction required in many places: file system journals, etc.

Each application implements its own Complexity Redundant work Reliability of the implementation

Great if a storage layer provides transactional API

Previous Work: disk-based

Copy-on-Write + Logging Fragmentation poor read performance

Checkpointing and cleaning Cleaning cost

SSDs mitigate these problems SSDs already do CoW for flash-related reasons Random read accesses are fast

Outline

Introduction The Case for TxFlash Commit Protocols Implementation Evaluation Conclusion

TxFlash Architecture & API

s

WriteAtomic(p1…pn) p1…pn are in a tranx followed by write(p1)…write(pn) atomicity, isolation, durability

Abort aborting in-progress tranx

In-progress tranx

Not issue conflict writes

Core of TxFlash

Simple Interface

WriteAtomic: multi-page writes Useful for file systems

Not full-fledged tranx: no reads in tranx Reduce complexity

Backward compatible

Flash is good for this purpose

Copy-on-write: already supported by FTL Fast random reads High concurrency

multiple flash chips inside New device:

New interface more likely

Outline

Introduction The Case for TxFlash Commit Protocols Implementation Evaluation Conclusion

Traditional Commit

First write to a log: Intention record: (data, page# & version#, tranx ID) … Intention record Commit record

Tranx is committed == commit record exists Intention records modify original data If modifications are done, the records can be

garbage collected

Traditional Commit on SSDs

Optimizations: All writes can be issued in parallel Not update the original data, just update the

remap table Problem: commit record

Extra latency after other writes Garbage collection is complicated:

Must know if all the updates complete or not

New Proposal (1): Simple Cyclic Commit

No commit record Intension records of the same tranx use

next links to form a cycle (data, page# & version#, next page# & version#)

Tranx is committed == all intension records are written

Flash page (4KB) + metadata (128B)are co-located

Problem

Solution:

Any uncommitted intention on the stable storage must be erased before any new writes are issued to the same or a referenced page

Operations

Initialization: Setting version# to 0, next-link to self

Transaction Garbage Collection:

For any uncommitted intention For committed page if a newer version is

committed Recovery: scan all pages then look for cycles

New Proposal (2):Back Pointer Cyclic Commit

Another way to deal with ambiguity Intention record:

(data, page#&version#, next-link, link to last committed version)

A3 is a straddler of A2

Some complexity in garbage collection and recovery because of this

Protocol Comparison

Outline

Introduction The Case for TxFlash Commit Protocols Implementation Evaluation Conclusion

Implementation

Simulatior DiskSim

trace-driven SSD simulator (UNIX’08)modifications for TxFlash

Support tranx of maximum size 4MB Pseudo-device driver for recording traces TxExt3:

Employ Txflash for Ext3 file system Tranx: Ext3 journal commit

Experimental Setup

TxFlash device: 32GB: 8x 4GB flash packages 4 I/O operations within every flash package 15% of space reserved for garbage collection

Workload on top of Ext3: IOzone: micro benchmark (no sync writes) Linux-build (no sync writes) Maildir (sync writes) TPC-B: simulate 10,000 credit-debit-like operations on

TxExt3 file system (sync writes)

Synthetic workloads

Cyclic commit vs. Traditional commit

Unlike database logging, large tranx sizes: no sync; data are included

• simple cyclic commit has a high cost if there are aborts

TxFlash vs. SSD

Remove WriteAtomic from traces Use SSD simulator SSD does not provide any transaction

guarantees (so should have better performance)

Space comparison: TxFlash needs 25% of more main memory than SSD

• 4+1 MB per 4GB flash 40 MB for the 32GB TxFlash device

End-to-end performance

TxFlash: Run pseudo-device driver on real SSD The performance is close to that of TxFlash

Ext3: Use SSD as journal

SSD cache is disabled in both cases

Summary

TxFlash: Adding transaction interface in SSD Cyclic commit protocols

Nice solution for file system journaling

top related