parity lost and parity regained andrew krioukov, lakshmi n. bairavasundaram, andrea c....
Post on 21-Dec-2015
226 views
TRANSCRIPT
Parity Lost and Parity Regained
Andrew Krioukov, Lakshmi N. Bairavasundaram,
Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-DusseauUniversity of Wisconsin - MadisonUniversity of Wisconsin - Madison
Garth R. Goodson, Kiran Srinivasan, Randy Thelen
Bare-bones RAID• Stripe data across multiple drives• Store redundant parity data• Can reconstruct data with any single disk failure• Will RAID protect data in all single failure cases?
A B C
Data 1 Data 2 Data 3 Parity
P(ABC)
2
Bare-bones RAID Problems• Stripe contains file ABC consisting of 3 blocks• RAID has redundancy to recover data• RAID does not detect corruption
Data 1 Data 2 Data 3 Parity
P(ABC)Corruption
Read file ABC
Return Corrupt File
AA BB @#$%C@#$%RAID Stripe
3
Bare-bones RAID Problems
• RAID cannot detect partial disk failures:– Corruptions– Torn writes– Lost writes– Misdirected writes
• RAID only protects against– Complete disk failures– Errors reported by the disk (e.g. Latent Sector
Errors)4
Data Protection Techniques
• Need improvements to bare-bones RAID
– Techniques needed to help detect errors
• Checksums are common
– Many kinds: block, sector, parent checksums
• Which type of checksums are used?
• We examined real systems to determine protection schemes
5
Enterprise RAID Systems• Mixed bag of protections
Scrub Sector Cksum
Block Cksum
Parent Cksum
Write Verify
PhysIdent
LogicalIdent
Write Stamp
Dell Power-vault
√ √ √
Hitachi Thunder
√ √ √NetApp ONTAP
√ √ √ √ √Sun ZFS √ √ 6
Question
• Which errors do these systems protect against?
• How can we ensure complete data protection?
• Need method to identify all corruption & data loss scenarios in a design
7
Model Checking Solution
• Create a model of storage system design using primitives
• Checker exhaustively searches space of all possible states– Start with clean RAID stripe– Apply single disk error– Apply any number of disk operations (e.g. write)
• Identifies all possible data loss scenarios
8
Results Summary• Applied model checking on enterprise RAID
system designs• For all designs, a single error can cause data
loss• Identified a common problem, parity pollution
– Partial disk failure goes undetected– The erroneous data is used to compute parity– Recovery is no longer possible
• Presented a design that protects against all single failures 9
Outline
• Introduction• Background: Storage Errors• Model Checking Approach• Data Protection Design & Analysis• Conclusion
10
Storage Errors• Latent Sector Errors
– Data is inaccessible– Explicit error code returned– Affect 19% of nearline, 2% of enterprise disks in 2
years [Bairavasundaram et al. SIGMETRICS’07]
• Corruptions– Data is silently corrupted– Affect 0.6% of nearline and 0.06% of enterprise
disks in 17 months [Bairavasundaram et al. FAST’08]
• Reality: Partial disk failures happen11
Storage Errors (Cont’d)• Torn Write
– Only part of a block is written– Some sectors are lost– Write returns success code
• Lost Writes– Write returns success code– Data not reflected on disk A
Write B
Success
A
Write B
Success
12
Storage Errors (Cont’d)
• Misdirected Writes– Write goes to wrong location
(either wrong block or wrong disk)– Combination of lost write
and corruption
13
A
Overwrite A A’ Success
BA’
Outline
• Introduction• Background: Storage Errors• Model Checking Approach• Data Protection Design & Analysis• Conclusion
14
Modeling Storage System
• Use primitives to describe:– On disk layout in terms of sectors– Data protections
• Checker uses built-in models:– Storage errors– Disk operations (e.g. Read/Write)– Basic RAID functionality
15
Model Checking• Assumptions
– Single RAID stripe– Single storage error– Single parity protection– Data disks are interchangeable
• Apply error followed by any number of disk operations
• Generate state diagram with all data loss states
16
State Diagram Example• Bare-bones RAID state diagram
Clean
Parity Error
Corrupt(p), Torn(p),Lost(p), Misdir(p)
Wadd(x+)
Disk x Error
Corrupt(x), Torn(x),Lost(x), Misdir(x)
Wsub(x+)
Corrupt Data
Polluted Parity
R(x)
R(x)
Wadd()
W(x+)
Wadd(!x)
17
Outline
• Introduction• Background: Storage Errors• Model Checking Approach• Data Protection Design & Analysis• Conclusion
18
Data Protection Design
• Need fault tolerance for all partial failures
• Bare-bones RAID handles latent sector errors and complete disk failures
• Corruption is next most common failure
• Add protections cumulatively until design has complete protection
19
ProtectionsProtections in red will be discussed in the talk• Scrubbing• Sector checksums• Block checksums• Parental checksums• Write verify• Physical identity• Logical identity• Version mirroring
20
Checksums• Checksum per data block
• Checksum per sector
• Parent checksum– Checksum stored in parent inode
21
Acksum(A)
ck(a1)
ck(a2)a2
a1
…
A
• Corruption scenario is now fixed
Data 1 Data 2 Data 3 Parity
Bcksum(B)
Acksum(A)
Ccksum(C)
P(ABC)cksum(P)
Checksum Example
22
Corruption
Read file ABC
@#$%@#$%cksum(C)
Perform reconstruction
File is valid
A B P(ABC)
CC
Checksum Problems
• Great for protecting against corruption errors• Fails to protect when data and checksum are
lost together:– Lost write (with any type of checksums)– Torn write (only with sector checksums)
• Parity pollution can occur
23
Data 1 Data 2 Data 3 Parity
Bcksum(B)
Acksum(A)
Ccksum(C)
P(ABC)cksum(P)
Checksum Problems – Lost Write• Block checksums
Overwrite C→C’
P(ABC’)
Lost Write
Read file ABC’
Ccksum(C)
Return data (ABC)Return Corrupt Data (C instead of C’)
Write Verify• Attempt to solve lost write problem• Costly solution, expect good protection• Procedure:
1. Write data to disk2. Read back to verify3. If lost write detected, write again
or remap to new location
Overwrite C→C’Lost Write
Ccksum(C)
Read back (C)Lost write detected, write C’ again
C’
Success
cksum(C’)
25
Write Verify Problems
• Protects against lost writes• Susceptible to misdirected writes
– Cannot detect/recover the overwritten data
26
Write Verify – Misdirected Write
Overwrite X→X’
Misdirected Write A
cksum(A)
Read back XLost, Re-write X
X’ Bcksum(B)
Parity
P(ABC)cksum(P)cksum(X’)
Read file ABC
X’
Return Corrupt Data (A has been corrupted)
Data 1 Data 2
Initially…
Later…
Data 3
Ccksum(C)
B C
27
X Y Z P(XYZ)X’ P(X’YZ)
Physical Identity• Protection against misdirected writes• Store disk & block number of destination in
each block
28
A 1
Overwrite Block 1: A A’ B 2A’
1Read Block 2
Returned (A’, 1)Block num does not match (1≠2)Misdirected Write Detected
Misdirected Write
Data, Block Number
Problem Solved?
• Write verify with block checksums and physical identity offers complete protection
• But… twice the I/O cost!• Need a more efficient solution
29
Logical Identity
• Less expensive protection against lost writes• Store file identifier (e.g. inode number) in
each data block• Test that file identifier
matches on a read
30
A
cksum(A)
Lost WriteOverwrite File 0
with File 1 (X)
File 0
Read File 1Logical ID does not match.Lost Write Detected
A
File 0
Logical Identity Problem
• Cannot be verified when re-computing parity– Not reading a file
• Parity pollution may occur
31
Parity Pollution ExampleData 1 Data 2 Data 3 Parity
Bcksum(B)
Acksum(A)
Ccksum(C)
P(ABC)cksum(P)
C→C’,
P(ABC’)
Lost Write
Overwrite AB →A’B’Parity:
A’cksum(A’)
B’cksum(B’)
A’ B’
C
Read Data 3 P(A’B’C)
Write File 1
Later… Write File 2
P(A’B’C)
Parity consistent with invalid data
File 0 File 0 File 0File 2 File 2
New Parity
Later… Read File 1 Logical ID mismatch (File 0 ≠ File 1)Reconstruct… Data is consistent!
C
File 0
Report Data Loss
A File0 B File0 C File0 P(ABC)P(ABC’)A’ File2 B’ File2 C’ File1 P(A’B’C’)What should
be on the disk
Version Mirroring• Lost write protection• Verifiable at RAID level• Store a version number in each data block• Mirror the version numbers on parity disk• Versions numbers verified on read
33
Bcksum(B)
Acksum(A)
Ccksum(C)
P(ABC)cksum(P)
Ver0 Ver0 Ver0 0,0,0
Parity Pollution SolvedData 1 Data 2 Data 3 Parity
Bcksum(B)
Acksum(A)
Ccksum(C)
P(ABC)cksum(P)
C→C’,
P(ABC’)
Lost Write
Overwrite AB →A’B’Parity:
A’cksum(A’)
B’cksum(B’)
A’ B’
Read Data 3
P(A’B’C’)
Write File 1
Later… Write File 2
P(A’B’C’)
Ver0 Ver0 Ver0Ver1 Ver 1
New Parity
0,0,00,0,1Ver0 0,0,1
Version mismatchReconstruct Data 3
Ver1
C
C’A B P(ABC’) C’
cksum(C’)
1,1,1
A Ver0 B Ver0 C Ver0 P(ABC)P(ABC’)A’ Ver1 B’ Ver1 C’ Ver1 P(A’B’C’)What should
be on the disk
C’
Problem Solved… Efficiently
• Version mirroring with block checksums and physical identity provides complete protection
• Use with logical identity for efficiency• More efficient than write verify
35
Conclusion• Applied model checking on real system designs
– For all designs, a single error can cause data loss– Parity pollution is a common problem– Version mirroring is a key technique to offering
complete and efficient data protection
• Partial failures are complex, no obvious data protection solution– Model checking is useful
36
37
ADvanced Systems Laboratorywww.cs.wisc.edu/adsl
Advanced Technology Grouphttp://www.netapp.com/company/research/