chapter 7 storage systems
DESCRIPTION
Chapter 7 Storage Systems. Outline. Introduction Types of Storage Devices RAID: Redundant Arrays of Inexpensive Disks Errors and Failures in Real Systems Benchmarks of Storage Performance and Availability Design An I/O System. Introduction. Motivation: Who Cares About I/O?. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/1.jpg)
1
Chapter 7 Storage Systems
![Page 2: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/2.jpg)
2
Outline
• Introduction• Types of Storage Devices• RAID: Redundant Arrays of Inexpensive Disks• Errors and Failures in Real Systems• Benchmarks of Storage Performance and Availability• Design An I/O System
![Page 3: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/3.jpg)
3
Introduction
![Page 4: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/4.jpg)
4
Motivation: Who Cares About I/O?
• CPU Performance: 2 times very 18 months• I/O performance limited by mechanical delays (disk I/O)
– < 10% per year (I/O per sec or MB per sec)
• Amdahl's Law: system speed-up limited by the slowest part!– 10% I/O & 10x CPU 5x Performance (lose 50%)– 10% I/O & 100x CPU 10x Performance (lose 90%)
• I/O bottleneck: – Diminishing fraction of time in CPU– Diminishing value of faster CPUs
![Page 5: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/5.jpg)
5
Position of I/O in Computer Architecture – Past
• An orphan in the architecture domain• I/O meant the non-processor and memory stuff
– Disk, tape, LAN, WAN, etc.– Performance was not a major concern
• Devices characterized as:– Extraneous, non-priority, infrequently used, slow
• Exception is swap area of disk– Part of the memory hierarchy– Hence part of system performance but you’re hosed if you use it
often
![Page 6: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/6.jpg)
6
Position of I/O in Computer Architecture – Now
• Trends – I/O is the bottleneck– Communication is frequent
• Voice response & transaction systems, real-time video• Multimedia expectations
– Even standard networks come in gigabit/sec flavors – For multi-computers
• Result– Significant focus on system bus performance
• Common bridge to the memory system and the I/O systems• Critical performance component for the SMP server platforms
![Page 7: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/7.jpg)
7
System vs. CPU Performance
• Care about speed at which user jobs get done– Throughput - how many jobs/time (system view)– Latency - how quick for a single job (user view)– Response time – time between when a command is issued and
results appear (user view)
• CPU performance main factor when:– Job mix fits in the memory there are very few page faults
• I/O performance main factor when:– The job is too big for the memory - paging is dominant– When the job reads/writes/creates a lot of unexpected files
• OLTP – Decision support -- Database– And then there is graphics & specialty I/O devices
![Page 8: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/8.jpg)
8
System Performance
• Depends on many factors in the worst case– CPU
– Compiler
– Operating System
– Cache
– Main Memory
– Memory-IO bus
– I/O controller or channel
– I/O drivers and interrupt handlers
– I/O devices: there are many types
• Level of autonomous behavior
• Amount of internal buffer capacity
• Device specific parameters for latency and throughput
![Page 9: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/9.jpg)
9
I/O Systems
Processor
Cache
Memory - I/O Bus
MainMemory
I/OController
Disk Disk
I/OController
I/OController
Graphics Network
interruptsinterrupts
May the same or differentMemory – I/O Bus
![Page 10: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/10.jpg)
10
Keys to a Balanced System
• It’s all about overlap - I/O vs CPU– Timeworkload = Timecpu + TimeI/O - Timeoverlap
• Consider the benefit of just speeding up one– Amdahl’s Law (see P4 as well)
• Latency vs. Throughput
![Page 11: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/11.jpg)
11
I/O System Design Considerations
• Depends on type of I/O device– Size, bandwidth, and type of transaction– Frequency of transaction– Defer vs. do now
• Appropriate memory bus utilization• What should the controller do
– Programmed I/O– Interrupt vs. polled– Priority or not– DMA– Buffering issues - what happens on over-run– Protection– Validation
![Page 12: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/12.jpg)
12
Types of I/O Devices
• Behavior– Read, Write, Both – Once, multiple– Size of average transaction– Bandwidth– Latency
• Partner - the speed of the slowest link theory– Human operated (interactive or not)– Machine operated (local or remote)
![Page 13: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/13.jpg)
13
Is I/O Important?
• Depends on your application– Business - disks for file system I/O– Graphics - graphics cards or special co-processors– Parallelism - the communications fabric
• Our focus = mainline uniprocessing– Storage subsystems (Chapter 7)– Networks (Chapter 8)
• Noteworthy Point– The traditional orphan– But now often viewed more as a front line topic
![Page 14: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/14.jpg)
14
Types of Storage Devices
![Page 15: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/15.jpg)
15
Magnetic Disks
• 2 important Roles– Long term, non-volatile storage – file system and OS– Lowest level of the memory hierarchy
• Most of the virtual memory is physically resident on the disk
• Long viewed as a bottleneck– Mechanical system slow– Hence they seem to be an easy target for improved technology– Disk improvement w.r.t. density have done better than Moore’s law
![Page 16: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/16.jpg)
16
Disks are organized into platters, tracks, and sectors
(1-12 * 2 sides)
(5000 – 30000 each surface)
(100 – 500)
A sector is the smallestunit that can be read or written
![Page 17: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/17.jpg)
17
Physical Organization Options
• Platters – one or many• Density - fixed or variable
– All tracks have the same no. of sectors?)
• Organization - sectors, cylinders, and tracks– Actuators - 1 or more– Heads - 1 per track or 1 per actuator– Access - seek time vs. rotational latency
• Seek related to distance but not linearly• Typical rotation: 3600 RPM or 15000 RPM
• Diameter – 1.0 to 3.5 inches
![Page 18: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/18.jpg)
18
Typical Physical Organization
• Multiple platters– Metal disks covered with magnetic recording material on both sides
• Single actuator (since they are expensive)– Single R/W head per arm– One arm per surface– All heads therefore over same cylinder
• Fixed sector size• Variable density encoding• Disk controller – usually built in processor + buffering
![Page 19: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/19.jpg)
19
Anatomy of a Read Access
• Steps– Memory mapped I/O over bus to controller– Controller starts access– Seek + rotational latency wait– Sector is read and buffered (validity checked)– Controller says ready or DMA’s to main memory and then says
ready
![Page 20: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/20.jpg)
20
Access Time
• Access Time– Seek time: time to move the arm over the proper track
• Very non-linear: accelerate and decelerate times complicate– Rotation latency (delay): time for the requested sector to rotate
under the head (on average: 0.5 * RPM)– Transfer time: time to transfer a block of bits (typically a sector)
under the read-write head– Controller overhead: the overhead the controller imposes in
performing an I/O access– Queuing delay: time spent waiting for a disk to become free
ayQueuingDelOverheadControllermeTransferTiayationalDelAverageRotkTimeAverageSeeessTimeAverageAcc
![Page 21: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/21.jpg)
21
Access Time Example
• Assumption: average seek time – 5ms; transfer rate – 40MB/sec; 10,000 RPM; controller overhead – 0.1ms; no queuing delay
• What is the average time to r/w a 512-byte sector?• Answer
msRPSRPM
msmsMBKB
RPMms
0.3sec003.0)60
10000(
5.0
10000
5.0
1.81.0013.00.30.51.0
sec0.405.0
10000
5.05
![Page 22: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/22.jpg)
22
Cost VS Performance
• Large-diameter drives have many more data to amortize the cost of electronics lowest cost per GB
• Higher sales volume lower manufacturing cost• 3.5-inch drive, the largest surviving drive in 2001, also has
the highest sales volume, so it unquestionably has the best price per GB
![Page 23: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/23.jpg)
23
Future of Magnetic Disks
• Areal density: bits/unit area is common improvement metric• Trends
– Until 1988: 29% improvement per year– 1988 – 1996: 60% per year– 1997 – 2001: 100% per year
• 2001– 20 billion bits per square inch– 60 billion bit per square inch demonstrated in labs
OnATrackInch
BitsfaceOnADiskSur
Inch
TrackstyArealDensi *
![Page 24: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/24.jpg)
24
Disk Price Trends by Capacity
![Page 25: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/25.jpg)
25
Disk Price Trends – Dollars Per MB
![Page 26: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/26.jpg)
26
Cost VS Access Time for SRAM, DRAM, and Magnetic Disk
![Page 27: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/27.jpg)
27
Disk Alternatives
• Optical Disks– Optical compact disks (CD) – 0.65GB– Digital video discs, digital versatile disks (DVD) – 4.7GB * 2 sides– Rewritable CD (CD-RW) and write-once CD (CD-R)– Rewritable DVD (DVD-RAM) and write-once DVD (DVD-R)
• Robotic Tape Storage• Optical Juke Boxes• Tapes – DAT, DLT• Flash memory
– Good for embedded systems– Nonvolatile storage and rewritable ROM
![Page 28: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/28.jpg)
28
Bus – Connecting I/O Devices to CPU/Memory
![Page 29: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/29.jpg)
29
I/O Connection Issues
• Shared communication link between subsystems– Typical choice is a bus
• Advantages– Shares a common set of wires and protocols low cost– Often based on standard - PCI, SCSI, etc. portable and versatility
• Disadvantages– Poor performance– Multiple devices imply arbitration and therefore contention– Can be a bottleneck
Connecting the CPU to the I/O device world
![Page 30: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/30.jpg)
30
I/O Connection Issues – Multiple Buses
• I/O bus– Lengthy
– Many types of connected devices
– Wide range in device bandwidth
– Follow a bus standard
– Accept devices varying in latency and bandwidth capabilities
• CPU-memory bus– Short
– High speed
– Match to the memory system to maximize CPU-memory bandwidth
– Knows all types of devices that must connect together
![Page 31: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/31.jpg)
31
Typical Bus Synchronous Read Transaction
![Page 32: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/32.jpg)
32
Bus Design Decisions
• Other things to standardize as well– Connectors– Voltage and current levels– Physical encoding of control signals– Protocols for good citizenship
![Page 33: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/33.jpg)
33
Bus Design Decisions (Cont.)
• Bus master: devices that can initiate a R/W transaction– Multiple : multiple CPUs, I/O device initiate bus transactions– Multiple bus masters need arbitration (fixed priority or random)
• Split transaction for multiple masters– Use packets for the full transaction (does not hold the bus)– A read transaction is broken into read-request and memory-reply
transactions– Make the bus available for other masters while the data is
read/written from/to the specified address– Transactions must be tagged– Higher bandwidth, but also higher latency
![Page 34: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/34.jpg)
34
Split Transaction Bus
![Page 35: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/35.jpg)
35
Bus Design Decisions (Cont.)
• Clocking: Synchronous vs. Asynchronous– Synchronous
• Include a clock in the control lines, and a fixed protocol for address and data relative to the clock
• Fast and inexpensive (little or no logic to determine what's next)• Everything on the bus must run at the same clock rate• Short length (due to clock skew)• CPU-memory buses
– Asynchronous• Easier to connect a wide variety of devices, and lengthen the bus• Scale better with technological changes• I/O buses
![Page 36: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/36.jpg)
36
Synchronous or Asynchronous?
![Page 37: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/37.jpg)
37
Standards
• The Good– Let the computer and I/O-device designers work independently– Provides a path for second party (e.g. cheaper) competition
• The Bad– Become major performance anchors– Inhibit change
• How to create a standard– Bottom-up
• Company tries to get standards committee to approve it’s latest philosophy in hopes that they’ll get the jump on the others (e.g. S bus, PC-AT bus, ...)
• De facto standards– Top-down
• Design by committee (PCI, SCSI, ...)
![Page 38: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/38.jpg)
38
Connecting the I/O Bus
• To main memory– I/O bus and CPU-memory bus may the same
• I/O commands on bus could interfere with CPU's access memory– Since cache misses are rare, does not tend to stall the CPU– Problem is lack of coherency– Currently, we consider this case
• To cache• Access
– Memory-mapped I/O or distinct instruction (I/O opcodes)
• Interrupt vs. Polling• DMA or not
– Autonomous control allows overlap and latency hiding– However there is a cost impact
![Page 39: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/39.jpg)
39
A typical interface of I/O devices and an I/O bus to the CPU-memory bus
![Page 40: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/40.jpg)
40
Processor Interface Issues
• Processor interface– Interrupts– Memory mapped I/O
• I/O Control Structures– Polling– Interrupts– DMA– I/O Controllers– I/O Processors
• Capacity, Access Time, Bandwidth• Interconnections
– Busses
![Page 41: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/41.jpg)
41
I/O Controller
Command, Interrupt…
I/O Address
Ready, done, error…
![Page 42: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/42.jpg)
42
Memory Mapped I/O
Single Memory & I/O Bus No Separate I/O Instructions
CPU
Interface Interface
Peripheral Peripheral
Memory
ROM
RAM
I/O$
CPU
L2 $
Memory Bus
Memory Bus Adaptor
I/O bus
Some portions of memory address space are assigned to I/O device.Reads/Writes to these space cause data transfer
![Page 43: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/43.jpg)
43
Programmed I/O
• Polling• I/O module performs the action,
on behalf of the processor• But I/O module does not
interrupt CPU when I/O is done• Processor is kept busy checking
status of I/O module– not an efficient way to use the
CPU unless the device is very fast!
• Byte by Byte…
![Page 44: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/44.jpg)
44
Interrupt-Driven I/O
• Processor is interrupted when I/O module ready to exchange data
• Processor is free to do other work
• No needless waiting• Consumes a lot of processor
time because every word read or written passes through the processor and requires an interrupt
• Interrupt per byte
![Page 45: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/45.jpg)
45
Direct Memory Access (DMA)
• CPU issues request to a DMA module (separate module or incorporated into I/O module)
• DMA module transfers a block of data directly to or from memory (without going through CPU)
• An interrupt is sent when the task is complete
– Only one interrupt per block, rather than one interrupt per byte
• The CPU is only involved at the beginning and end of the transfer
• The CPU is free to perform other tasks during data transfer
![Page 46: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/46.jpg)
46
Input/Output Processors
CPU IOP
Mem
D1
D2
Dn
. . .main memory
bus
I/Obus
CPU
IOP
issues instruction to IOP
interrupts when done(1)
memory
(2)
(3)
(4)
Device to/from memorytransfers are controlledby the IOP directly.
IOP steals memory cycles.
OP Device Address
target devicewhere cmnds are
looks in memory for commands
OP Addr Cnt Other
whatto do
whereto putdata
howmuch
specialrequests
![Page 47: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/47.jpg)
47
RAID: Redundant Arrays of Inexpensive Disks
![Page 48: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/48.jpg)
48
3 Important Aspects of File Systems
• Reliability – is anything broken?– Redundancy is main hack to increased reliability
• Availability – is the system still available to the user?– When single point of failure occurs is the rest of the system still
usable?– ECC and various correction schemes help (but cannot improve
reliability)
• Data Integrity– You must know exactly what is lost when something goes wrong
![Page 49: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/49.jpg)
49
Disk Arrays
• Multiple arms improve throughput, but not necessarily improve latency
• Striping– Spreading data over multiple disks
• Reliability– General metric is N devices have 1/N reliability
• Rule of thumb: MTTF of a disk is about 5 years– Hence need to add redundant disks to compensate
• MTTR ::= mean time to repair (or replace) (hours for disks)• If MTTR is small then the array’s MTTF can be pushed out
significantly with a fairly small redundancy factor
![Page 50: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/50.jpg)
50
Data Striping
• Bit-level striping: split the bit of each bytes across multiple disks– No. of disks can be a multiple of 8 or divides 8
• Block-level striping: blocks of a file are striped across multiple disks; with n disks, block i goes to disk (i mod n)+1
• Every disk participates in every access– Number of I/O per second is the same as a single disk– Number of data per second is improved
• Provide high data-transfer rates, but not improve reliability
![Page 51: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/51.jpg)
51
Redundant Arrays of Disks
• Files are "striped" across multiple disks• Availability is improved by adding redundant disks
– If a single disk fails, the lost information can be reconstructed from redundant information
– Capacity penalty to store redundant information– Bandwidth penalty to update
• RAID– Redundant Arrays of Inexpensive Disks– Redundant Arrays of Independent Disks
![Page 52: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/52.jpg)
52
Raid Levels, Reliability, Overhead Redundant
information
![Page 53: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/53.jpg)
53
RAID Levels 0 - 1
• RAID 0 – No redundancy (Just block striping)– Cheap but unable to withstand even a single failure
• RAID 1 – Mirroring – Each disk is fully duplicated onto its "shadow“– Files written to both, if one fails flag it and get data from the mirror– Reads may be optimized – use the disk delivering the data first– Bandwidth sacrifice on write: Logical write = two physical writes– Most expensive solution: 100% capacity overhead– Targeted for high I/O rate , high availability environments
• RAID 0+1 – stripe first, then mirror the stripe• RAID 1+0 – mirror first, then stripe the mirror
![Page 54: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/54.jpg)
54
RAID Levels 2 & 3
• RAID 2 – Memory style ECC– Cuts down number of additional disks
– Actual number of redundant disks will depend on correction model
– RAID 2 is not used in practice
• RAID 3 – Bit-interleaved parity– Reduce the cost of higher availability to 1/N (N = # of disks)
– Use one additional redundant disk to hold parity information
– Bit interleaving allows corrupted data to be reconstructed
– Interesting trade off between increased time to recover from a failure and cost reduction due to decreased redundancy
– Parity = sum of all relative disk blocks (module 2)
• Hence all disks must be accessed on a write – potential bottleneck
– Targeted for high bandwidth applications: Scientific, Image Processing
![Page 55: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/55.jpg)
55
P100100111100110110010011
. . .
logical record 10010011
11001101
10010011
00110000
Striped physicalrecords
25% capacity cost for parity in this configuration (1/N)
RAID Level 3: Parity Disk (Cont.)
![Page 56: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/56.jpg)
56
RAID Levels 4 & 5 & 6
• RAID 4 – Block interleaved parity– Similar idea as RAID 3 but sum is on a per block basis– Hence only the parity disk and the target disk need be accessed– Problem still with concurrent writes since parity disk bottlenecks
• RAID 5 – Block interleaved distributed parity– Parity blocks are interleaved and distributed on all disks– Hence parity blocks no longer reside on same disk– Probability of write collisions to a single drive are reduced– Hence higher performance in the consecutive write situation
• RAID 6– Similar to RAID 5, but stores extra redundant information to guard
against multiple disk failures
![Page 57: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/57.jpg)
57
Raid 4 & 5 Illustration
RAID 4 RAID 5
Targeted for mixed applicationsA logical write becomes four physical I/Os
![Page 58: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/58.jpg)
58
Small Write Update on RAID 3
![Page 59: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/59.jpg)
59
Small Writes Update on RAID 4/5
RAID-5: Small Write Algorithm
D0 D1 D2 D3 PD0'
+
+
D0' D1 D2 D3 P'
newdata
olddata
old parity
XOR
XOR
(1. Read) (2. Read)
(3. Write) (4. Write)
1 Logical Write = 2 Physical Reads + 2 Physical Writes
![Page 60: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/60.jpg)
60
Errors and Failures in Real Systems
![Page 61: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/61.jpg)
61
Examples
• Berkeley’s Tertiary Disk• Tandem• VAX• FCC
![Page 62: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/62.jpg)
62
Berkeley’s Tertiary Disk18 months of operation
SCSI backplane, cables, Ethernetcables were no more reliable thandata disks
![Page 63: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/63.jpg)
63
Benchmarks of Storage Performance and Availability
![Page 64: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/64.jpg)
64
Transaction Processing (TP) Benchmarks
• TP: database applications, OLTP• Concerned with I/O rate (# of disk accesses per second)• Started with anonymous gang of 24 members in 1985
– DebitCredit benchmark: simulate bank tellers and has as it bottom line the number of debit/credit transactions per second (TPS)
• Tighter & more standard benchmark versions– TPC-A, TPC-B– TPC-C: complex query processing - more accurate model of a real
bank which models credit analysis for loans– TPC-D, TPC-H, TPC-R, TPC-W
• Also must report the cost per TPS– Hence machine configuration is considered
![Page 65: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/65.jpg)
65
TP Benchmarks
![Page 66: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/66.jpg)
66
TP Benchmark -- DebitCredit
• Disk I/O is random reads and writes of 100-byte records along with occasional sequential writes– 2—10 disk I/Os per transaction
– 5000 – 20000 CPU instructions per disk I/O
• Performance relies on…– The efficiency of TP software
– How many disk accesses can be avoided by keeping information in main memory (cache) !!! wrong for measuring disk I/O
• Peak TPS– Restriction: 90% of transactions have < 2sec. response time
– For TPS to increase, # of tellers and the size of the account file must also increase (more TPS requires more users)
• To ensure that the benchmark really measure disk I/O (not cache…)
![Page 67: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/67.jpg)
67
Relationship Among TPS, Tellers, and Account File Size
The data set generally must scale in size as the throughput increases
![Page 68: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/68.jpg)
68
SPEC System-Level File Server (SFS) Benchmark
• SPECsfs - system level file server– 1990 agreement by 7 vendors to evaluate NFS performance– Mix of file reads, writes, and file operations– Write: 50% done on 8KB blocks, 50% on partial (1, 2, 4KB)– Read: 85% full block, 15% partial block
• Scales the size of FS according to the reported throughput– For every 100 NFS operations per second, the capacity must
increase by 1GB– Limit average response time, such as 40ms
• Does not normalize for different configuration• Retired in June 2001 due to bugs
![Page 69: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/69.jpg)
69
SPECsfsUnfair configuration
OverallResponse time(ms)
![Page 70: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/70.jpg)
70
SPECWeb
• Benchmark for evaluating the performance of WWW servers• SPECWeb99 workload simulates accesses to a Web server
provider supporting HP for several organizations• For each HP, nine files in each of the four classes
– Less than 1KB (small icons): 35% of activity– 1—10KB: 50% of activity– 10—100KB: 14% of activity– 100KB—1MB (large document and image): 1% of activity
• SPECWeb99 results in 2000 for Dell Computers– Large memory is used for a file cache to reduce disk I/O– Impact of Web server software and OS
![Page 71: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/71.jpg)
71
SPECWeb99 Results for Dell
![Page 72: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/72.jpg)
72
Examples of Benchmarks of Dependability and Availability
• TPC-C has a dependability requirement: must handle a single disk failure
• Brown and Patterson [2000]– Focus on the effectiveness of fault tolerance in systems– Availability can be measured by examining the variations in system
QOS metrics over time as faults are injected into the system– The initial experiment injected a single disk fault
• Software RAID by Linux, Solaris, and Windows 2000– Reconstruct data onto a hot spare disk
• Disk emulator injects faults• SPECWeb99 workload
![Page 73: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/73.jpg)
73
Availability Benchmark for Software RAID
(Red Hat 6.0) (Solaris 7)
![Page 74: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/74.jpg)
74
Availability Benchmark for Software RAID (Cont.)
(Windows 2000)
![Page 75: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/75.jpg)
75
Availability Benchmark for Software RAID (Cont.)
• The longer the reconstruction (MMTF), the lower the availability– Increased reconstruction speed implies decreased application
performance– Linux VS. Solaris and Windows 2000
• RAID reconstruction– Linux and Solaris: initiate reconstruction automatically– Windows 2000: initiate reconstruction manually by operators
• Managing transient faults– Linux: paranoid– Solaris and Windows: ignore most transient faults
![Page 76: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/76.jpg)
76
Designing an I/O System
![Page 77: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/77.jpg)
77
I/O Design Complexities
• Huge variety of I/O devices– Latency– Bandwidth– Block size
• Expansion is a must – longer buses, larger power and cabinets
• Balanced Performance and Cost• Yet another n-dimensional conflicting• Constraint problem
– Yep - it’s NP hard just like all the rest– Experience plays a big role since the solutions are heuristic
![Page 78: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/78.jpg)
78
7 Basic I/O Design Steps
• List types of I/O devices and buses to be supported• List physical requirements of I/O devices
– Volume, power, bus slots, expansion slots or cabinets, ...
• List cost of each device and associated controller• List the reliability of each I/O device• Record CPU resource demands - e.g. cycles
– Start, support, and complete I/O operation
– Stalls due to I/O waits
– Overhead - e.g. cache flushes and context switches
• List memory and bus bandwidth demands• Assess the performance of different ways to organize I/O devices
– Of course you’ll need to get into queuing theory to get it right
![Page 79: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/79.jpg)
79
An Example
• Impact on CPU of reading a disk page directly into cache.• Assumptions
– 16KB page, 64-bytes cache-block
– Addresses of the new page are not in the cache
– CPU will not access data in the new page
– 95% displaced cache block will be read in again (miss)
– Write-back cache, 50% are dirty
– I/O buffers a full cache block before writing to cache
– Access and misses are spread uniformly to all cache blocks
– No other interference between CPU and I/O for the cache slots
– 15,000 misses per 1 million clock cycles when no I/O
– Miss penalty = 30 CC, 30 CC mores to write dirty-blocks
– 1 page is brought in every 1 million clock cycles
![Page 80: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/80.jpg)
80
An Example (Cont.)
• Each page fills 16,384/64 or 256 blocks• 0.5 * 256 * 30 CCs to write dirty blocks to memory• 95% * 256 (244) are referenced again and misses
– All of them are dirty and will need to be written back when replaced– 244 * 60 more CCs to write back
• In total: 128 * 30 + 244 * 60 more CCs than 1,000,000+15,000*30+7,500*30– 1% decrease in performance
![Page 81: Chapter 7 Storage Systems](https://reader034.vdocuments.site/reader034/viewer/2022051115/56814a93550346895db79f4c/html5/thumbnails/81.jpg)
81
Five More Examples
• Naive cost-performance design and evaluation• Availability of the first example• Response time of the first example• Most realistic cost-performance design and evaluation• More realistic design for availability and its evaluation