transactional flash v. prabhakaran, t. l. rodeheffer, l. zhou (msr, silicon valley), osdi 2008

31
Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008 Shimin Chen Big Data Reading Group

Upload: violet-santana

Post on 31-Dec-2015

45 views

Category:

Documents


2 download

DESCRIPTION

Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008. Shimin Chen Big Data Reading Group. Introduction. SSD: block-level APIs as disks Lost of opportunity - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008

Transactional FlashV. Prabhakaran, T. L. Rodeheffer, L.

Zhou (MSR, Silicon Valley), OSDI 2008

Shimin Chen

Big Data Reading Group

Page 2: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008

Introduction

SSD: block-level APIs as disks Lost of opportunity

Goal: new abstractions for better matching the nature of the new medium as well as the need from file systems and databases

Page 3: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008

Idea: Transactional Flash (Txflash)

An SSD (w/ new features) Addressing: a linear array of pages Support read and write operations Support a simple transactional construct

Each tranx consists of a series of write operations Atomicity Isolation Durability

Page 4: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008

Why is this useful?

Transaction abstraction required in many places: file system journals, etc.

Each application implements its own Complexity Redundant work Reliability of the implementation

Great if a storage layer provides transactional API

Page 5: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008

Previous Work: disk-based

Copy-on-Write + Logging Fragmentation poor read performance

Checkpointing and cleaning Cleaning cost

SSDs mitigate these problems SSDs already do CoW for flash-related reasons Random read accesses are fast

Page 6: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008

Outline

Introduction The Case for TxFlash Commit Protocols Implementation Evaluation Conclusion

Page 7: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008

TxFlash Architecture & API

s

WriteAtomic(p1…pn) p1…pn are in a tranx followed by write(p1)…write(pn) atomicity, isolation, durability

Abort aborting in-progress tranx

In-progress tranx

Not issue conflict writes

Core of TxFlash

Page 8: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008

Simple Interface

WriteAtomic: multi-page writes Useful for file systems

Not full-fledged tranx: no reads in tranx Reduce complexity

Backward compatible

Page 9: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008

Flash is good for this purpose

Copy-on-write: already supported by FTL Fast random reads High concurrency

multiple flash chips inside New device:

New interface more likely

Page 10: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008

Outline

Introduction The Case for TxFlash Commit Protocols Implementation Evaluation Conclusion

Page 11: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008

Traditional Commit

First write to a log: Intention record: (data, page# & version#, tranx ID) … Intention record Commit record

Tranx is committed == commit record exists Intention records modify original data If modifications are done, the records can be

garbage collected

Page 12: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008

Traditional Commit on SSDs

Optimizations: All writes can be issued in parallel Not update the original data, just update the

remap table Problem: commit record

Extra latency after other writes Garbage collection is complicated:

Must know if all the updates complete or not

Page 13: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008

New Proposal (1): Simple Cyclic Commit

No commit record Intension records of the same tranx use

next links to form a cycle (data, page# & version#, next page# & version#)

Tranx is committed == all intension records are written

Flash page (4KB) + metadata (128B)are co-located

Page 14: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008

Problem

Page 15: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008

Solution:

Any uncommitted intention on the stable storage must be erased before any new writes are issued to the same or a referenced page

Page 16: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008

Operations

Initialization: Setting version# to 0, next-link to self

Transaction Garbage Collection:

For any uncommitted intention For committed page if a newer version is

committed Recovery: scan all pages then look for cycles

Page 17: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008

New Proposal (2):Back Pointer Cyclic Commit

Another way to deal with ambiguity Intention record:

(data, page#&version#, next-link, link to last committed version)

Page 18: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008

A3 is a straddler of A2

Some complexity in garbage collection and recovery because of this

Page 19: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008

Protocol Comparison

Page 20: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008

Outline

Introduction The Case for TxFlash Commit Protocols Implementation Evaluation Conclusion

Page 21: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008

Implementation

Simulatior DiskSim

trace-driven SSD simulator (UNIX’08)modifications for TxFlash

Support tranx of maximum size 4MB Pseudo-device driver for recording traces TxExt3:

Employ Txflash for Ext3 file system Tranx: Ext3 journal commit

Page 22: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008

Experimental Setup

TxFlash device: 32GB: 8x 4GB flash packages 4 I/O operations within every flash package 15% of space reserved for garbage collection

Workload on top of Ext3: IOzone: micro benchmark (no sync writes) Linux-build (no sync writes) Maildir (sync writes) TPC-B: simulate 10,000 credit-debit-like operations on

TxExt3 file system (sync writes)

Synthetic workloads

Page 23: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008

Cyclic commit vs. Traditional commit

Page 24: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008

Unlike database logging, large tranx sizes: no sync; data are included

Page 25: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008

• simple cyclic commit has a high cost if there are aborts

Page 26: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008
Page 27: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008

TxFlash vs. SSD

Remove WriteAtomic from traces Use SSD simulator SSD does not provide any transaction

guarantees (so should have better performance)

Page 28: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008

Space comparison: TxFlash needs 25% of more main memory than SSD

• 4+1 MB per 4GB flash 40 MB for the 32GB TxFlash device

Page 29: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008

End-to-end performance

TxFlash: Run pseudo-device driver on real SSD The performance is close to that of TxFlash

Ext3: Use SSD as journal

SSD cache is disabled in both cases

Page 30: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008
Page 31: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008

Summary

TxFlash: Adding transaction interface in SSD Cyclic commit protocols

Nice solution for file system journaling