data integrity in the cloud - snia.org · data integrity data integrity in brief: writing data...

Post on 21-Apr-2018

223 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

2012 Storage Developer Conference. © Nebula, Inc. All Rights Reserved.

Data integrity in the Cloud

Christoph HellwigNebula, Inc

2012 Storage Developer Conference. © Nebula, Inc. All Rights Reserved.

Data Integrity

Data integrity in brief:Writing data reliably to persistent

storageRetrieving the same data again later

2012 Storage Developer Conference. © Nebula, Inc. All Rights Reserved.

Application writes

Application

Disk

OS Cache

Applications only write to the OS cache

Reliable ACKs lostNeed fsync to:

Transfer data to diskGet a reliable ACK

2012 Storage Developer Conference. © Nebula, Inc. All Rights Reserved.

Operating System writes

Operating System

Disk

Disk Cache

Often the OS writes to disk cache only

Reliable ACKs lostNeed a cache flush to:

Transfer data to diskGet reliable ACK

2012 Storage Developer Conference. © Nebula, Inc. All Rights Reserved.

Write granularity

Byte-level writes are not atomicOS caches operate on page granularity

(usually 4096 bytes)Disk drives operate on sectors (usually

512 bytes)Striped RAID operates on much larger

granularities (but pretends not to)

2012 Storage Developer Conference. © Nebula, Inc. All Rights Reserved.

Write granularity issues

Too small writes need read-modify-write cyclesMay cause untouched data to be

corrupted Too large writes can be torn

Half-updated data may be on disk

2012 Storage Developer Conference. © Nebula, Inc. All Rights Reserved.

Reading data

As long as the disk and data is there that's easy, right?Silent data corruption happens (a lot)Disk might fail entirely and go away

2012 Storage Developer Conference. © Nebula, Inc. All Rights Reserved.

Data protection

Disks use error correction to deal with bit flipsReads that succeed should return the

right data To protect against disk failure data is

stored in multiple placesN-way mirroring (e.g. RAID1)Erasure encoding (e.g. RAID5/6)

2012 Storage Developer Conference. © Nebula, Inc. All Rights Reserved.

File System

File System

Multipathing

Volume Manager

Multipathing

File System

Hypervisor

Driver Driver

RAID / SAN

Guest

RAID / SANDisk

Driver

I/O architecture complexity

Personal Computer Enterprise VirtualizationEnterprise Server

Increasing complexity

2012 Storage Developer Conference. © Nebula, Inc. All Rights Reserved.

I/O architecture complexity

Guest

The Cloud !

GuestGuest Guest

2012 Storage Developer Conference. © Nebula, Inc. All Rights Reserved.

The Cloud?

Cloud computing in I/O terms:Virtualization as abstractionMassive scale distributed systems (Object Storage)

2012 Storage Developer Conference. © Nebula, Inc. All Rights Reserved.

Multipathing

File System

Hypervisor

Driver

Guest

RAID

Cloud I/O architectures (1)

The “Enterprise” I/O stack Just treats “cloud” as a

management layerScale-out of compute and

disk is handled independently

2012 Storage Developer Conference. © Nebula, Inc. All Rights Reserved.

File System

Hypervisor

Driver

Guest

Disk

Cloud I/O architectures (2)

“Instance” storageLocal non-fault tolerant

storageUsed for instance-local

temporary dataPart of the AWS and

Openstack models

2012 Storage Developer Conference. © Nebula, Inc. All Rights Reserved.

Distributed storage system

Hypervisor

Guest

Disk

Cloud I/O architectures (3)

Distributed storageAccess by internal

networkingData stored on a

large number of nodes

Hypervisor

Guest

Disk

Driver Driver

2012 Storage Developer Conference. © Nebula, Inc. All Rights Reserved.

Cloud data integrity issues

Mapping between layersComplicated I/O stacksMapping block I/O semantics on file

system semantics Distributed systems

Replication needs to be location-awareFailure is the norm

2012 Storage Developer Conference. © Nebula, Inc. All Rights Reserved.

Without words

2012 Storage Developer Conference. © Nebula, Inc. All Rights Reserved.

I/O layering issues

Caching semantics need to be preserved over all layers Including mapping them from block

device layers to file systems and backOn distributed systems multiple writers

complicate semantics a lotThis may include live migration

2012 Storage Developer Conference. © Nebula, Inc. All Rights Reserved.

Data placement

Rack

Switch

Node

Node

Node

Node

Node

Node

Rack

Switch

Node

Node

Node

Node

Node

Node

Rack

Switch

Node

Node

Node

Node

Node

Node

2012 Storage Developer Conference. © Nebula, Inc. All Rights Reserved.

Data placement

Rack

Switch

Node

Node

Node

Node

Node

Node

Rack

Switch

Node

Node

Node

Node

Node

Node

Rack

Switch

Node

Node

Node

Node

Node

Node

Good placement?Good placement?

2012 Storage Developer Conference. © Nebula, Inc. All Rights Reserved.

Data placement

Rack

Switch

Node

Node

Node

Node

Node

Node

Rack

Switch

Node

Node

Node

Node

Node

Node

Rack

Switch

Node

Node

Node

Node

Node

Node

Better placement?Better placement?

2012 Storage Developer Conference. © Nebula, Inc. All Rights Reserved.

Data placement

Rack

Switch

Node

Node

Node

Node

Node

Node

Rack

Switch

Node

Node

Node

Node

Node

Node

Rack

Switch

Node

Node

Node

Node

Node

Node

Or maybe this?Or maybe this?

2012 Storage Developer Conference. © Nebula, Inc. All Rights Reserved.

In-flight data protection

Data might get corrupted in-flight Lots of potentially buggy I/O layers Large clusters → multiple hops through

switches Various protocols offer error checking or

correction E.g. iSCSI crc32c But we'd really like to be able to verify

integrity through the whole I/O stack

2012 Storage Developer Conference. © Nebula, Inc. All Rights Reserved.

T10 DIF/DIX

End-to-end data protectionStores application checksum in

extended disk sectorsStandard format that can be validated

by all intermediate layers Issues:

Requires disk support, but not part of the ATA standard

2012 Storage Developer Conference. © Nebula, Inc. All Rights Reserved.

Questions?

top related