raid+controllers

RAID CONTROLLERS and DATA BACKUP

RAID

• Redundant Array of Independent Disks • Redundant Array of Inexpensive Disks• 6 levels in common use• Not a hierarchy• Set of physical disks viewed as single logical drive

by O/S• Data distributed across physical drives• Can use redundant capacity to store parity

information

• Each RAID scheme affects reliability and performance in different ways. Every additional disk included in an array increases the likelihood that one will fail, but by using error checking and/or mirroring, the array as a whole can be made more reliable by the ability to survive and recover from a failure.

RAID 0

• No redundancy• Data striped across all disks• Round Robin striping• Increase speed

– Multiple data requests probably not on same disk

– Disks seek in parallel– A set of data is likely to be striped across

multiple disks

RAID 1

• Mirrored Disks• Data is striped across disks• 2 copies of each stripe on separate disks• Read from either• Write to both• Recovery is simple

– Swap faulty disk & re-mirror– No down time

• Expensive

RAID 2

• Disks are synchronized• Very small stripes

– Often single byte/word

• Error correction calculated across corresponding bits on disks

• Multiple parity disks store Hamming code error correction in corresponding positions

• Lots of redundancy– Expensive– Not used

RAID 3

• Similar to RAID 2

• Only one redundant disk, no matter how large the array

• Simple parity bit for each set of corresponding bits

• Data on failed drive can be reconstructed from surviving data and parity info

• Very high transfer rates

RAID 4

• Each disk operates independently

• Good for high I/O request rate

• Large stripes

• Bit by bit parity calculated across stripes on each disk

• Parity stored on parity disk

RAID 5

• Like RAID 4

• Parity striped across all disks

• Round robin allocation for parity stripe

• Avoids RAID 4 bottleneck at parity disk

• Commonly used in network servers

• N.B. DOES NOT MEAN 5 DISKS!!!!!

RAID-0RAID-0

Strip 12Strip 8Strip 4Strip 0




Striped, non-redundantStriped, non-redundant Parallel access to multiple disksParallel access to multiple disks Excellent data transfer rate (for small strips)Excellent data transfer rate (for small strips) Excellent I/O request processing rate (for large strips)Excellent I/O request processing rate (for large strips)

Typically used for applications requiring high Typically used for applications requiring high performance for non-critical dataperformance for non-critical data

RAID-1RAID-1



Mirrored/replicated (most costly form of redundancy)Mirrored/replicated (most costly form of redundancy) I/O request rate: good for reads, fair for writesI/O request rate: good for reads, fair for writes Data transfer rate: good for reads; writes slightly slowerData transfer rate: good for reads; writes slightly slower

Read can be serviced by the disk with the shorter seek distanceRead can be serviced by the disk with the shorter seek distance Write must be handled by both disksWrite must be handled by both disks

Typically used in system drives and critical filesTypically used in system drives and critical files Banking, insurance dataBanking, insurance data Web (e-commerce) serversWeb (e-commerce) servers

Combining RAID-0 and RAID-1Combining RAID-0 and RAID-1









Can combine RAID-0 and RAID-1:Can combine RAID-0 and RAID-1: Mirrored stripes (RAID 0+1, or RAID 01)Mirrored stripes (RAID 0+1, or RAID 01)

Example: picture aboveExample: picture above Striped Mirrors (RAID 1+0, or RAID 10)Striped Mirrors (RAID 1+0, or RAID 10)

Data transfer rate: good for reads and writesData transfer rate: good for reads and writes Reliability: goodReliability: good Efficiency: poor (100% overhead in terms of disk Efficiency: poor (100% overhead in terms of disk

utilization)utilization)

RAID-2RAID-2

b0 b1 b2 b3 f0(b) f1(b) f2(b)

Hamming codes capable of detecting two or more erasuresHamming codes capable of detecting two or more erasures E.g., single error-correcting, double error-detecting (SEC-DED)E.g., single error-correcting, double error-detecting (SEC-DED)

Problem with small writes (similar to DRAM cycle Problem with small writes (similar to DRAM cycle time/access time)time/access time)

Poor I/O request ratePoor I/O request rate Excellent data transfer rateExcellent data transfer rate

RAID-3RAID-3

b0 b1 b2 b3 P(b)

Fine-grained (bit) interleaving with parityFine-grained (bit) interleaving with parity E.g., parity = sum modulo 2 (XOR) of all bitsE.g., parity = sum modulo 2 (XOR) of all bits

Disks are synchronized, parity computed by disk controllerDisks are synchronized, parity computed by disk controller When one disk fails… When one disk fails… (how do you know?)(how do you know?)

Data is recovered by subtracting all data in good disks from parity Data is recovered by subtracting all data in good disks from parity diskdisk

Recovering from failures takes longer than in mirroring, but failures Recovering from failures takes longer than in mirroring, but failures are rare, so is okayare rare, so is okay

Hot spares used to reduce vulnerability in reduced modeHot spares used to reduce vulnerability in reduced mode Performance:Performance:

Poor I/O request ratePoor I/O request rate Excellent data transfer rateExcellent data transfer rate Typically used in large I/O request size applications, such as Typically used in large I/O request size applications, such as

imaging or CADimaging or CAD

RAID-4RAID-4

Coarse-grained striping with parityCoarse-grained striping with parity Unlike RAID-3, not all disks need to be read on each writeUnlike RAID-3, not all disks need to be read on each write

New parity computed by computing difference between old and new New parity computed by computing difference between old and new datadata

Drawback:Drawback: Like RAID-3, parity disk involved in every write; serializes small readsLike RAID-3, parity disk involved in every write; serializes small reads

I/O request rate: excellent for reads, fair for writesI/O request rate: excellent for reads, fair for writes Data transfer rate: good for reads, fair for writesData transfer rate: good for reads, fair for writes

Blk 12Blk 8Blk 4Blk 0




P(12-15)P(8-11)P(4-7)P(0-3)

RAID-5RAID-5


P(12-15)Blk 9Blk 5Blk 1

Blk 13P(8-11)

Blk 6Blk 2

Blk 14Blk 10P(4-7)Blk 3

Blk 15Blk 11Blk 7

P(0-3)

Key Idea: reduce load on parity disk Key Idea: reduce load on parity disk Block-interleaved Block-interleaved distributed paritydistributed parity Multiple writes can occur simultaneouslyMultiple writes can occur simultaneously Block 0 can be accessed in parallel with Block 5Block 0 can be accessed in parallel with Block 5

First needs disks 1 and 5; second needs disks 2 and 4First needs disks 1 and 5; second needs disks 2 and 4 I/O request rate: excellent for reads, good for writesI/O request rate: excellent for reads, good for writes Data transfer rate: good for reads, good for writesData transfer rate: good for reads, good for writes Typically used for high request rate, read-intensive data lookupTypically used for high request rate, read-intensive data lookup

Striped set with dual distributed parity.Striped set with dual distributed parity. Provides fault tolerance Provides fault tolerance from two drive failures; array continues to operate with up to two failed from two drive failures; array continues to operate with up to two failed drives. This makes larger RAID groups more practical, especially for high drives. This makes larger RAID groups more practical, especially for high availability systems. This becomes increasingly important because large-availability systems. This becomes increasingly important because large-capacity drives lengthen the time needed to recover from the failure of a capacity drives lengthen the time needed to recover from the failure of a single drive. single drive.

Nesting RAID LevelsNesting RAID Levels When nesting RAID levels, a RAID type that When nesting RAID levels, a RAID type that

provides redundancy is typically combined provides redundancy is typically combined with RAID 0 to boost performance. With these with RAID 0 to boost performance. With these configurations it is preferable to have RAID 0 configurations it is preferable to have RAID 0 on top and the redundant array at the bottom, on top and the redundant array at the bottom, because fewer disks then need to be because fewer disks then need to be regenerated when a disk fails. regenerated when a disk fails.

RAID 01 and RAID 10RAID 01 and RAID 10 The minimum number of disks required to The minimum number of disks required to

implement this level of RAID is 4. The implement this level of RAID is 4. The difference between RAID 0+1 and RAID 1+0 is difference between RAID 0+1 and RAID 1+0 is the location of each RAID system — RAID 0+1 the location of each RAID system — RAID 0+1 is a mirror of stripes. The size of a RAID 0+1 is a mirror of stripes. The size of a RAID 0+1 array can be calculated as follows where array can be calculated as follows where nn is is the number of drives (must be even) and the number of drives (must be even) and cc is is the capacity of the smallest drive in the array:the capacity of the smallest drive in the array:

Size = (nxc) / 2Size = (nxc) / 2

RAID level 30RAID level 30 is also known as striping of dedicated is also known as striping of dedicated parity arrays. It is a combination of RAID level 3 and parity arrays. It is a combination of RAID level 3 and RAID level 0. RAID 30 provides high data transfer RAID level 0. RAID 30 provides high data transfer rates, combined with high data reliability. RAID 30 is rates, combined with high data reliability. RAID 30 is best implemented on two RAID 3 disk arrays with data best implemented on two RAID 3 disk arrays with data striped across both disk arrays. RAID 30 breaks up striped across both disk arrays. RAID 30 breaks up data into smaller blocks, and then stripes the blocks of data into smaller blocks, and then stripes the blocks of data to each RAID 3 raid set. RAID 3 breaks up data data to each RAID 3 raid set. RAID 3 breaks up data into smaller blocks, calculates parity by performing an into smaller blocks, calculates parity by performing an Exclusive OR on the blocks, and then writes the blocks Exclusive OR on the blocks, and then writes the blocks to all but one drive in the array. The parity bit created to all but one drive in the array. The parity bit created using the Exclusive OR is then written to the last drive using the Exclusive OR is then written to the last drive in each RAID 3 array. The size of each block is in each RAID 3 array. The size of each block is determined by the stripe size parameter, which is set determined by the stripe size parameter, which is set when the RAID is created.when the RAID is created.

RAID 50RAID 50

RAID 100RAID 100

Rebuilding Failure DrivesRebuilding Failure Drives Parity CalculationParity Calculation

Parity data in a RAID environment is calculated using the Boolean XOR Parity data in a RAID environment is calculated using the Boolean XOR function. For example, here is a simple RAID 4 three-disk setup function. For example, here is a simple RAID 4 three-disk setup consisting of two drives that hold 8 bits of data each and a third drive consisting of two drives that hold 8 bits of data each and a third drive that will be used to hold parity data.that will be used to hold parity data.Drive 1: Drive 1: 0110110101101101Drive 2: Drive 2: 1101010011010100

To calculate parity data for the two drives, a XOR is performed on their data.To calculate parity data for the two drives, a XOR is performed on their data.i.e. i.e. 0110110101101101 XOR XOR 1101010011010100 = = 1011100110111001

The resulting parity data, The resulting parity data, 1011100110111001, is then stored on Drive 3, the , is then stored on Drive 3, the dedicated parity drive.dedicated parity drive.

Should any of the three drives fail, the contents of the failed drive can be Should any of the three drives fail, the contents of the failed drive can be reconstructed on a replacement (or "hot spare") drive by subjecting the reconstructed on a replacement (or "hot spare") drive by subjecting the data from the remaining drives to the same XOR operation. If Drive 2 data from the remaining drives to the same XOR operation. If Drive 2 were to fail, its data could be rebuilt using the XOR results of the were to fail, its data could be rebuilt using the XOR results of the contents of the two remaining drives, Drive 3 and Drive 1:contents of the two remaining drives, Drive 3 and Drive 1:Drive 3: Drive 3: 1011100110111001Drive 1: Drive 1: 0110110101101101

i.e. i.e. 1011100110111001 XOR XOR 0110110101101101 = = 1101010011010100

Hamming CodeHamming Code There's an error correction There's an error correction

code that separates the bits code that separates the bits holding the original value holding the original value (data bits) from the error (data bits) from the error correction bits (check bits), correction bits (check bits), and the difference between and the difference between the calculated and actual the calculated and actual error correction bits is the error correction bits is the position of the bit that's position of the bit that's wrongwrong

For M data bits and K check bits, we must For M data bits and K check bits, we must have:have:

22KK – 1 >= (M+K) – 1 >= (M+K)

Calculate Check bits for M = 8?Calculate Check bits for M = 8?

Hamming Code CalculationHamming Code CalculationBit PositionBit Position 1212 1111 1010 99 88 77 66 55 44 33 22 11

Position noPosition no

(Binary)(Binary)

Data bitsData bitsD8D8 oo oo oo oo oo oo DD

11

Check bitsCheck bits

(Power of 2’s)(Power of 2’s)CC44 xx xx CC

11

Calculate the Hamming word for data = 00111001

RAID is not BackupRAID is not Backup A RAID system used as a main drive is not a A RAID system used as a main drive is not a

replacement for backing up data. Data may replacement for backing up data. Data may become damaged or destroyed without harm become damaged or destroyed without harm to the drive(s) on which they are stored. For to the drive(s) on which they are stored. For example, some of the data may be overwritten example, some of the data may be overwritten by a system malfunction; a file may be by a system malfunction; a file may be damaged or deleted by user error or malice damaged or deleted by user error or malice and not noticed for days or weeks. RAID can and not noticed for days or weeks. RAID can also be overwhelmed by catastrophic failure also be overwhelmed by catastrophic failure that exceeds its recovery capacity and, of that exceeds its recovery capacity and, of course, the entire array is at risk of physical course, the entire array is at risk of physical damage by fire, natural disaster, or human damage by fire, natural disaster, or human forces. forces.

Classes of RAIDClasses of RAID Failure-resistant disk systems (FRDS) (meets a Failure-resistant disk systems (FRDS) (meets a

minimum of criteria 1 - 6):minimum of criteria 1 - 6): Protection against data loss and loss of access to data Protection against data loss and loss of access to data

due to disk drive failuredue to disk drive failure Reconstruction of failed drive content to a Reconstruction of failed drive content to a

replacement drivereplacement driveProtection against data loss due to a "write hole"Protection against data loss due to a "write hole"

Protection against data loss due to host and host I/O Protection against data loss due to host and host I/O bus failurebus failure

Protection against data loss due to replaceable unit Protection against data loss due to replaceable unit failurefailureReplaceable unit monitoring and failure indicationReplaceable unit monitoring and failure indication

Failure-tolerant disk systems (FTDS) (meets a Failure-tolerant disk systems (FTDS) (meets a minimum of criteria 7 - 15 ):minimum of criteria 7 - 15 ): Disk automatic swap and hot swapDisk automatic swap and hot swap Protection against data loss due to cache failureProtection against data loss due to cache failure Protection against data loss due to external power failureProtection against data loss due to external power failure Protection against data loss due to a temperature out of Protection against data loss due to a temperature out of

operating rangeoperating range Replaceable unit and environmental failure warningReplaceable unit and environmental failure warning Protection against loss of access to data due to device Protection against loss of access to data due to device

channel failurechannel failure Protection against loss of access to data due to controller Protection against loss of access to data due to controller

module failuremodule failure Protection against loss of access to data due to cache Protection against loss of access to data due to cache

failurefailure Protection against loss of access to data due to power Protection against loss of access to data due to power

supply failuresupply failure

Disaster-tolerant disk systems (DTDS) (meets Disaster-tolerant disk systems (DTDS) (meets a minimum of criteria 16 - 21):a minimum of criteria 16 - 21): Protection against loss of access to data due to host Protection against loss of access to data due to host

and host I/O bus failureand host I/O bus failure Protection against loss of access to data due to Protection against loss of access to data due to

external power failureexternal power failure Protection against loss of access to data due to Protection against loss of access to data due to

component replacementcomponent replacement Protection against loss of data and loss of access to Protection against loss of data and loss of access to

data due to multiple disk failuredata due to multiple disk failure Protection against loss of access to data due to zone Protection against loss of access to data due to zone

failurefailure Long-distance protection against loss of data due to Long-distance protection against loss of data due to

zone failurezone failure

raid+controllers

Technology

noncritical data strip

disk disks

disk parity

good disks

mirrored disks data

raid scheme

raid controllers

intensive data lookup