raid: high-performance, reliable secondary storage

1.High Performance, Reliable Secondary StorageUur Tlkolu Gebze Institute of Technology

2. Overview Introduction Background Disk Terminology Data Paths Technology Trends Disk Array Basics Data Striping and Redundancy Basic Raid Organizations Performance and Cost Comparisons Reliability Implementation Considerations 2 3. Overview Advanced Topics Improving Small Write Performance for RAID Level 5 Declustered Parity Exploiting On-Line Spare Disks Data Striping in Disk Arrays Performance and Reliability Modeling Opportunities for Future Research Experience with Disk Arrays Interaction among New Organizations Scalability, Massively Parallel Computers and Small Disks Latency 3 4. Introduction RAID: Redundant Arrays of Inexpensive / IndependentDisks Improvements in microprocessors and memory systems require larger, higher-performance secondary storage systems Microprocessors performance increase rate > Disk performance increase rate Disk arrays: multiple, independent disks large, high-performance logical disk 4 5. Background Disk Terminology Data Paths Technology Trends5 6. Disk Terminology6 7. Data Paths7 8. Technology Trends8 9. Disk Array Basics Data Striping and Redundancy Basic Raid Organizations Performance and Cost Comparisons Reliability Implementation Considerations9 10. Data Striping and Redundancy Data Striping Distribute data over multiple disks Service in parallel More disks More performance10 11. Data Striping and Redundancy More disks More unreliable 100 disks 1/100 reliability of a single disks Redundancy Two categories Granularity of data interleaving Method of computing redundant information and distribute accross the disk array11 12. Data Striping and Redundancy Data interleaving Fine grained Advantages: access all the disks high transfer rate Disadvantages only one I/O request serviced at any time All disks waste time positioning for every request12 13. Data Striping and Redundancy Data interleaving Coarse grained Advantages: Multiple small requests serviced simultaneously Large requests can access all the disks13 14. Data Striping and Redundancy Redundancy Two main problems Computing the redundant information: Parity Selecting a method for distributing the redundant information accross the disk array14 15. Basic Raid Organizations Nonredundant (RAID Level 0) Lowest cost Best write performance No best read performance Any single disk failure result data loss15 16. Basic Raid Organizations Mirrored (RAID Level 1) Twice number of disks Data also written to redundant disk If a disk fails, the other copy is used16 17. Basic Raid Organizations Memory Style ECC (RAID Level 2) Contain parity disks Parity disk proportional to data disks Efficiency increases when data disk number increases Multiple parity disks are needed to identify the failed disk, but only one is needed to recover17 18. Basic Raid Organizations Bit-Interleaved Parity (RAID Level 3) Bit-wise data is used Disk controller can identify which disk has failed A single parity disk is used Read all disks, Write all disks + parity disk18 19. Basic Raid Organizations Block-Interleaved Parity (RAID Level 4) Same as Level 3 but blocks (striping units) are used Read & write < striping unit one disk Parity calculation xor new data with old data Four I/O: write new data, read old data and old parity, write new parity Bottleneck at parity disk19 20. Basic Raid Organizations Block-Interleaved Distributed-Parity (RAID Level 5) Solves bottleneck problem at Level 4 Best small read, larger read and large write performance Small writes are inefficient because of read-modifywrite20 21. Basic Raid Organizations P + Q Redundancy (RAID Level 6) Have stronger codes to solve multiple failures Operate in much the same manner as Level 5 Small writes are inefficient because of 6 I/O requests due to update both P and Q information21 22. Performance and Cost Comparisons Ground Rules and Observations Reliability, performance and cost Disk arrays are throughput oriented I/Os per second per dollar Configuration RAID 5 can operate as RAID 1 and RAID 3 by configuring striping unit22 23. Performance and Cost Comparisons Comparisons Small Read & Writes23 24. Performance and Cost Comparisons Comparisons Large Read & Writes24 25. Performance and Cost Comparisons Comparisons RAID 3 & 5 & 625 26. Reliability Basic Reliability RAID 5 MTTF: mean time to failure, MTTR: mean time to repair N: total number of disks, G: parity group size 100 disks each had MTTF of 200.000 hours, MTTR of 1 hour, partiy group size 16 mean time to failure of the system is about 3000 years !!!26 27. Reliability Basic Reliability RAID 6 MTTF: mean time to failure, MTTR: mean time to repair N: total number of disks, G: parity group size 100 disks each had MTTF of 200.000 hours, MTTR of 1 hour, partiy group size 16 mean time to failure of the system is about 38.000.000 years !!!27 28. System Crashes and Parity Inconsistency System crash: power failure, operator error, hardwarebreakdown, software crash etc. Causes parity inconsistencies in both bit-interleaved and block-interleaved disk arrays System crash may occur more frequently than disk failures To avoid the loss of parity on system crashes, information sufficient to recover parity mus be logged on a non-volatile storage (nvram) before each write operation. 28 29. Uncorrectable Bit Errors What is bit error? It is unclear Data is incorrectly written or magnetic mediagradually damaged Some manifacturers developed an approach; monitors the warnings given by disks and notifies an operator when it feels the disk is about to fail.29 30. Correlated Disk Failures Environmental and manufacturing factors Example: earthquake30 31. Reliability Revisited Double disk failure System crash followed by a disk failure Disk failure followed by an uncorrectable bit errorduring reconstruction31 32. Reliability Revisited32 33. Reliability Revisited33 34. Reliability Revisited34 35. Implementation Considerations Avoiding Stale Data When a disk fails, failed disk must be marked as invalid. Invalid mark prevents user from reading corrupted data on the failed disk When an invalid logical sector is reconstructed to a spare disk, the logical sector must be marked as valid.35 36. Implementation Considerations Regenerating Parity after a System Crash Before servicing any write request, the corresponding parity sectors must be marked inconsistent When bringing a system up from a system crash, all inconsistent parity sectors must be regenerated36 37. Implementation Considerations Operating with a Failed Disk Demand reconstruction: access to a parity stripe with an invalid sector triggers reconstruction of the appropriate data immediately onto a spare disk. A background process scans the entire disk. Parity sparing: before servicing a write request, the invalid sector is reconstructed and relocated to overwrite its corresponding parity sector37 38. Implementation Considerations Orthagonal Raid38 39. Advanced Topics Improving Small Write Performance for RAID Level 5 Declustered Parity Exploiting On-Line Spare Disks Data Striping in Disk Arrays Performance and Reliability Modeling39 40. Improving Small Write Performance for RAID Level 5 Buffering and Caching Write buffering (async writes): Collect small writes in a buffer and write as a large data Read caching: reduce four I/O access to three, old data is read from cache40 41. Improving Small Write Performance for RAID Level 5 Floating Parity Shortens the read-modify-write time Many free blocks New parity block is writted rotationally nearest unallocated block following the old parity block Implemented on disk controller41 42. Improving Small Write Performance for RAID Level 5 Floating Parity Shortens the read-modify-write time Many free blocks New parity block is writted rotationally nearest unallocated block following the old parity block Implemented on disk controller42 43. Improving Small Write Performance for RAID Level 5 Parity Logging Delaying the read of old parity and write of the new parity Difference is temporarily logged Logs are grouped together and large contiguous blocks are updated more efficiently43 44. Declustered Parity Distributes the increased load uniformly over all disks44 45. Exploiting On-Line Spare Disks Distributed Sparing45 46. Exploiting On-Line Spare Disks Parity Sparing46 47. Data Striping in Disk Arrays Disk positioning time is wasted work Idle times are same as disk positioning Data striping or interleaving is, distributing dataamong multiple disks. Researchers work on data striping unit size to maximize the throughput47 48. Data Striping in Disk Arrays P: average disk positioning time X: average disk transfer rate L: concurrency Z: request size N: array size in disks48 49. Performance and Reliability Modeling Performance Kim: response time equations Kim & Tantawi: approximate service time equations Chen & Towsley Lee & Katz Reliability Markov49 50. Opportunities for Future Research Experience with Disk Arrays Interaction among New Organizations Scalability, Massively Parallel Computers and SmallDisks Latency50