CSE521: Introduction to Computer Architecture
Mazin YousifI/O Subsystem
RAID (Redundant Array of Independent Disks)
MSY F02 2
RAID
• Improvements in Microprocessor performance (~ 50%) widely exceeds that of disk access time (~ 10%) - depends on Mechanical System
• Improvements in Magnetic Media Densities has also been Slow (~ 20%)
• Solution: Disk Arrays: Uses Parallelism between Multiple Disks to Improve Aggregate I/O Performance
– Disk Arrays stripe data across multiple disks and access them in parallel
• Capacity Penalty to store redundant data
• Bandwidth Penalty to update it
RAID
MSY F02 3
• Positive Aspects of Disk Arrays:– Higher data transfer rate on large data accesses
– Higher I/O rates on small data accesses
– Uniform load balancing across all the disks - no hot spots (Hopefully)
• Negative Aspects of Disk Arrays:– Higher vulnerability to disk failures - Need to employ redundancy in the
form of Error Correcting Code to tolerate failures
• Several Data Striping and Redundancy Schemes
• Sequential access generates highest data transfer with minimal head positioning
• Random accesses generates high I/O rates with lots of head positioning
RAID
MSY F02 4
• Data is Striped for improved performance– Distributes data over multiple disks to make them appear as a single
fast large disk
– Allows multiple I/Os to be serviced in parallel• Multiple independent requests serviced in parallel
• A block request may be serviced in parallel by multiple disks
• Data is Redundant for improved reliability– Large number of disks in an array lowers the reliability of the array
• Reliability of N disks = Reliability of 1 disk /N
• Example:
– 50,000 hours / 70 disks = 700 hours
– Disk System MTTF drops from 6 years to 1 month
– Arrays without redundancy are unreliable to be useful
RAID
MSY F02 5
• RAID 0 (Non-redundant)– Stripes Data; but does not employ redundancy
– Lowest cost of any RAID
– Best Write performance - no redundant information
– Any single disk failure is catastrophic
– Used in environments where performance is more important than reliability.
RAID
MSY F02 6
D0 D3D2D1
D4 D7D6D5
D8 D11D10D9
D12 D15D14D13
D19D18D17D16
Stripe Unit
Stripe
RAID
Disk 1 Disk 4Disk 3Disk 2
MSY F02 7
• RAID 1 (Mirrored)– Uses twice as many disks as non-redundant arrays - 100% Capacity
Overhead - Two copies of data are maintained
– Data is simultaneously written to both arrays
– Data is read from the array with shorter queuing, seek and rotation delays - Best Read Performance.
– When a disk fails, mirrored copy is still available
– Used in environments where availability and performance (I/O rate) are more important than storage efficiency.
RAID
MSY F02 8
• RAID 2 (Memory Style ECC)– Uses Hamming code - parity for distinct overlapping subsets of data
– # of redundant disks is proportional to log of total # of disks - Better for large # of disks - e.g., 4 data disks require 3 redundant disks
– If disk fails, other data in subset is used to regenerate lost data
– Multiple redundant disks are needed to identify faulty disk
RAID
MSY F02 9
• RAID 3 (Bit Interleaved Parity)– Data is bit -wise over the data disks
– Uses Single parity disk to tolerate disk failures - Overhead is 1/N
– Logically a single high capacity, high transfer rate disk
– Reads access data disks only; Writes access both data and parity disks
– Used in environments that require high BW (Scientific, Image Processing, etc.) , and not high I/O rates
100100
111010
110111
100101
110011
RAID
MSY F02 10
• RAID 4 (Block Interleaved Parity)– Similar to bit-interleaved parity disk array; except data is block-
interleaved (Striping Units)
– Read requests smaller than one striping unit, access one Striping unit
– Write requests update the data block; and the parity block.
– Generating parity requires 4 I/O accesses (RMW)
– Parity disk gets updates on all writes - a bottleneck
RAID
MSY F02 11
• RAID 5 (Block-Interleaved Distributed Parity)– Eliminates the parity disk bottleneck in RAID 4 - Distributes parity
among all the disks
– Data is distributed among all disks
– All disks participates in read requests - Better performance than RAID 4
– Write requests update the data block; and the parity block.
– Generating parity requires 4 I/O accesses (RMW)
– Left symmetry v.s. Right Symmetry - Allows each disk to be traversed once before any disk twice
RAID
MSY F02 12
D0 PD3D2D1
D4 D7PD6D5
D8 D11D10PD9
D12 D15D14D13P
P D19D18D17D16
Stripe Unit
Stripe
RAID
MSY F02 13
D0’ D0 PD3D2D1
+
+
Old Parity(2. Read)
Old Data1. Read
NewData
D0’ P’D3D2D1
3. WriteNew Data 4. Write New
Parity
RAID
MSY F02 14
• RAID 6 (P + Q Redundancy)– Uses Reed-Solomon codes to protect against up to 2 disk failures
– Data is distributed among all disks
– Two sets of parity P & Q
– Write requests update the data block; and the parity blocks.
– Generating parity requires 6 I/O accesses (RMW) - update both P & Q
– Used in environments that require stringent reliability requirements
RAID
MSY F02 15
• Comparisons– Read/Write Performance
• RAID 0 provides the best Write performance
• RAID 1 provides the best Read Performance
– Cost - Total # of Disks• RAID 1 is most expensive - 100% capacity overhead - 2N Disks
• RAID 0 is least expensive - N Disks - no redundancy
• RAID 2 needs N + ceiling(log2N) + 1
• RAID 3, RAID 4 & RAID 5 needs N + 1 disks
RAID
Comparisons
MSY F02 16
• Preferred Environments– RAID 0: Performance & capacity are more important than reliability
– RAID 1: High I/O rate, high availability environments
– RAID 2: Large I/O Data Transfer
– RAID 3: High BW Applications (Scientific, Image Processing…)
– RAID 4: High bit BW Applications
– RAID 5 & RAID 6: Mixed Applications
RAID
MSY F02 17
RAID
MSY F02 18
• Performance:– What metric ?
• IOPS ?
• Byte/sec ?
• Response Time ?
• IOPS per $$ ?
• Hybrid ?
– Application Dependent• Transaction Processing: IOPS per $$
• Scientific Applications: Bytes/sec per $$
• File Servers: Both IOPS and Bytes/sec
• Time-Sharing Applications: User Capacity per $$
RAID
MSY F02 19
RAID Level Small Reads Small Writes Large Reads Large Writes Storage Efficiency
RAID 0 1 1 1 1 1
RAID 1 1 1/2 1 1/2 1/2
RAID 3 1/G 1/G (G-1)/G (G-1)/G (G-1)/G
RAID 5 1 max(1/G,1/4) 1 (G-1)/G (G-1)/G
RAID 6 1 max(1/G,1/6) 1 (G-2)/G (G-2)/G
The table below, which shows Throughput per $$ relative to RAID 0, assumes that G drives in an error correcting group* RAID 3 performance/cost is always =< RAID 5 performance
RAID
MSY F02 20
Performance Issues
• Improving Small Write Performance for RAID 5:– Writes need 4 I/O accesses; Overhead is emphasized for small writes
• Response time increases by factor of 2; Throughput decreases by factor of 4.
• In contrast, RAID 1 writes require two writes - concurrent - latency may increase; throughput decreases by factor of 2.
• Three techniques to improve RAID 5 performance
• Buffering & Caching:– Disk cache (Write buffering) acknowledges the host before data is
written to disk
– Under high load, write backs increase & response time goes back to 4 times RAID 0
– During write back, group sequential writes together
– Keep a copy of old data before writing ==> 3 I/O accesses
– Keep the new parity & new data in cache; Any later updates will require 2 I/O accesses
RAID
MSY F02 21
Performance Issues
• Floating Parity:– Shortens RMW of small writes to average 1 I/O access
– Clusters parity into cylinders; each containing a track of free blocks
– When new parity needs updating, it is written on the closest unallocated block following old parity
New parity update is approximately one read plus 1msec.
– Overhead: Directories for unallocated blocks and parity blocks in a cache in RAID adapter Mbytes of memory
– Floating Data??• Larger directories
• sequential data may become discontinuous on disk
RAID
MSY F02 22
Performance Issues
• Parity Logging:– Delay writing the new parity
– Create an “update image” - difference between old & new parity - and store in log file in RAID adapter
– Hopefully, can group several parity blocks when writing back
– Log file is stored in NVRAM - can extend NVRAM to disk space
– Although, may be more I/Os, but efficient since large chunks of data are processed
– Logging reduces I/O accesses for small writes from 4 to possibly 2+
– Overhead: NVRAM, extra disk space, memory when applying parity update image to old parity
RAID
MSY F02 23
Hardware v.s. Software RAID
• RAID can be implemented in the OS– In RAID 1, Hardware RAID allows 100% mirroring. OS implemented
mirroring must distinguish between Master & Slave drives.• Only master drive has the boot code; if it fails, you can continue work, but no
booting is possible
• Hardware mirroring does not have this drawback
– Since software RAIDs implement standard SCSI, repair functions such as support for spare drives and hot plug have not been implemented; in contrast hardware RAID implements various repair functions.
– Hardware RAID improves system performance with its caching system, especially during high load situations, and synchronization
– Microsoft Windows NT implements RAID 0 and RAID1
RAID
MSY F02 24
• What RAID for which application– Fast Workstation:
• Caching is important to improve I/O rate
• If large files are installed, then RAID 0 may be necessary
• It is preferred to put the OS and swap files in separate drives from user drives to minimize movement between swap file area & user area.
– Small Server:• RAID 1 is preferred
– Mid-Size Server:• If more capacity is needed, then RAID 5 is recommended
– Large Server: e.g. Database Servers• RAID 5 is preferred
• Separate different I/Os in mechanically independent arrays; place index & data files in databases in different arrays
RAID