cs252 graduate computer architecture lecture 18 i/o, buses, queuing theory john kubiatowicz...

53
CS252 Graduate Computer Architecture Lecture 18 I/O, Buses, Queuing Theory John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~kubitron/ cs252

Upload: silvia-warren

Post on 18-Jan-2018

216 views

Category:

Documents


0 download

DESCRIPTION

4/6/2009cs252-S09, Lecture 18 3 Motivation: Who Cares About I/O? CPU Performance: 60% per year I/O system performance limited by mechanical delays (disk I/O) –< 10% per year (IO per sec or MB per sec) Amdahl's Law: system speed-up limited by the slowest part! –10% IO & 10x CPU => 5x Performance (lose 50%) –10% IO & 100x CPU => 10x Performance (lose 90%) I/O bottleneck: –Diminishing fraction of time in CPU –Diminishing value of faster CPUs

TRANSCRIPT

Page 1: CS252 Graduate Computer Architecture Lecture 18 I/O, Buses, Queuing Theory John Kubiatowicz Electrical Engineering and Computer Sciences University of

CS252Graduate Computer Architecture

Lecture 18

I/O, Buses, Queuing Theory

John KubiatowiczElectrical Engineering and Computer Sciences

University of California, Berkeley

http://www.eecs.berkeley.edu/~kubitron/cs252

Page 2: CS252 Graduate Computer Architecture Lecture 18 I/O, Buses, Queuing Theory John Kubiatowicz Electrical Engineering and Computer Sciences University of

4/6/2009 cs252-S09, Lecture 18 2

Review: Reed-Solomon Codes

4

3

2

1

0

43210

43210

43210

43210

43210

43210

43210

77777666665555544444333332222211111

aaaaa

G

• Reed-solomon codes (Non-systematic):

– Data as coefficients, code space as values of polynomial:

– P(x)=a0+a1x1+… a4x4

– Coded: P(1),P(2)….,P(6),P(7)• Called Vandermonde Matrix:

maximum rank• Different representation

(This H’ and G not related)– Clear that all combinations of

two or less columns independent d=3

– Very easy to pick whatever d you happen to want: add more rows

• Fast, Systematic version of Reed-Solomon:

– Cauchy Reed-Solomon, others

1111111

0000000'

76543217654321

H

Page 3: CS252 Graduate Computer Architecture Lecture 18 I/O, Buses, Queuing Theory John Kubiatowicz Electrical Engineering and Computer Sciences University of

4/6/2009 cs252-S09, Lecture 18 3

Motivation: Who Cares About I/O?• CPU Performance: 60% per year• I/O system performance limited by mechanical

delays (disk I/O)– < 10% per year (IO per sec or MB per sec)

• Amdahl's Law: system speed-up limited by the slowest part!

– 10% IO & 10x CPU => 5x Performance (lose 50%)– 10% IO & 100x CPU => 10x Performance (lose 90%)

• I/O bottleneck: – Diminishing fraction of time in CPU– Diminishing value of faster CPUs

Page 4: CS252 Graduate Computer Architecture Lecture 18 I/O, Buses, Queuing Theory John Kubiatowicz Electrical Engineering and Computer Sciences University of

4/6/2009 cs252-S09, Lecture 18 4

I/O Systems

Processor

Cache

Memory - I/O Bus

MainMemory

I/OController

Disk Disk

I/OController

I/OController

Graphics Network

interruptsinterrupts

Page 5: CS252 Graduate Computer Architecture Lecture 18 I/O, Buses, Queuing Theory John Kubiatowicz Electrical Engineering and Computer Sciences University of

4/6/2009 cs252-S09, Lecture 18 5

A Bus Is:• shared communication link• single set of wires used to connect multiple

subsystems

• A Bus is also a fundamental tool for composing large, complex systems

– systematic means of abstraction

Control

Datapath

Memory

ProcessorInput

Output

What is a bus?

Page 6: CS252 Graduate Computer Architecture Lecture 18 I/O, Buses, Queuing Theory John Kubiatowicz Electrical Engineering and Computer Sciences University of

4/6/2009 cs252-S09, Lecture 18 6

• Versatility:– New devices can be added easily– Peripherals can be moved between computer

systems that use the same bus standard• Low Cost:

– A single set of wires is shared in multiple ways

MemoryProcesserI/O

DeviceI/O

DeviceI/O

Device

Advantages of Buses

Page 7: CS252 Graduate Computer Architecture Lecture 18 I/O, Buses, Queuing Theory John Kubiatowicz Electrical Engineering and Computer Sciences University of

4/6/2009 cs252-S09, Lecture 18 7

• It creates a communication bottleneck– The bandwidth of that bus can limit the maximum I/O

throughput• The maximum bus speed is largely limited by:

– The length of the bus– The number of devices on the bus– The need to support a range of devices with:

» Widely varying latencies » Widely varying data transfer rates

MemoryProcesserI/O

DeviceI/O

DeviceI/O

Device

Disadvantage of Buses

Page 8: CS252 Graduate Computer Architecture Lecture 18 I/O, Buses, Queuing Theory John Kubiatowicz Electrical Engineering and Computer Sciences University of

4/6/2009 cs252-S09, Lecture 18 8

• Control lines:– Signal requests and acknowledgments– Indicate what type of information is on the data lines

• Data lines carry information between the source and the destination:

– Data and Addresses– Complex commands

Data Lines

Control Lines

The General Organization of a Bus

Page 9: CS252 Graduate Computer Architecture Lecture 18 I/O, Buses, Queuing Theory John Kubiatowicz Electrical Engineering and Computer Sciences University of

4/6/2009 cs252-S09, Lecture 18 9

• A bus transaction includes two parts:– Issuing the command (and address) – request– Transferring the data – action

• Master is the one who starts the bus transaction by:– issuing the command (and address)

• Slave is the one who responds to the address by:– Sending data to the master if the master ask for data– Receiving data from the master if the master wants to send data

BusMaster

BusSlave

Master issues command

Data can go either way

Master versus Slave

Page 10: CS252 Graduate Computer Architecture Lecture 18 I/O, Buses, Queuing Theory John Kubiatowicz Electrical Engineering and Computer Sciences University of

4/6/2009 cs252-S09, Lecture 18 10

Types of Buses• Processor-Memory Bus (design specific)

– Short and high speed– Only need to match the memory system

» Maximize memory-to-processor bandwidth– Connects directly to the processor– Optimized for cache block transfers

• I/O Bus (industry standard)– Usually is lengthy and slower– Need to match a wide range of I/O devices– Connects to the processor-memory bus or backplane bus

• Backplane Bus (standard or proprietary)– Backplane: an interconnection structure within the chassis– Allow processors, memory, and I/O devices to coexist– Cost advantage: one bus for all components

Page 11: CS252 Graduate Computer Architecture Lecture 18 I/O, Buses, Queuing Theory John Kubiatowicz Electrical Engineering and Computer Sciences University of

4/6/2009 cs252-S09, Lecture 18 11

A Computer System with One Bus:

Backplane Bus

• A single bus (the backplane bus) is used for:– Processor to memory communication– Communication between I/O devices and memory

• Advantages: Simple and low cost• Disadvantages: slow and the bus can become a

major bottleneck• Example: IBM PC - AT

Processor Memory

I/O Devices

Backplane Bus

Page 12: CS252 Graduate Computer Architecture Lecture 18 I/O, Buses, Queuing Theory John Kubiatowicz Electrical Engineering and Computer Sciences University of

4/6/2009 cs252-S09, Lecture 18 12

A Two-Bus System

• I/O buses tap into the processor-memory bus via bus adaptors:

– Processor-memory bus: mainly for processor-memory traffic– I/O buses: provide expansion slots for I/O devices

• Apple Macintosh-II– NuBus: Processor, memory, and a few selected I/O devices– SCCI Bus: the rest of the I/O devices

Processor Memory

I/OBus

Processor Memory Bus

BusAdaptor

BusAdaptor

BusAdaptor

I/OBus

I/OBus

Page 13: CS252 Graduate Computer Architecture Lecture 18 I/O, Buses, Queuing Theory John Kubiatowicz Electrical Engineering and Computer Sciences University of

4/6/2009 cs252-S09, Lecture 18 13

A Three-Bus System (+ backside cache)

• A small number of backplane buses tap into the processor-memory bus

– Processor-memory bus is only used for processor-memory traffic– I/O buses are connected to the backplane bus

• Advantage: loading on the processor bus is greatly reduced

Processor MemoryProcessor Memory Bus

BusAdaptor

BusAdaptor

BusAdaptor

I/O Bus

BacksideCache bus

I/O BusL2 Cache

Page 14: CS252 Graduate Computer Architecture Lecture 18 I/O, Buses, Queuing Theory John Kubiatowicz Electrical Engineering and Computer Sciences University of

4/6/2009 cs252-S09, Lecture 18 14

The move from Parallel to Serial I/O

• Shared Parallel Bus Wires– Clock rate limited by clock skew across long bus (~100MHz)– High power to drive large number of loaded bus lines– Central bus arbiter adds latency to each transaction,– Expensive parallel connectors and backplanes/cables – Examples: VMEbus, Sbus, ISA bus, PCI, SCSI, IDE

• Dedicated Point-to-point Serial Links– Point-to-point links run at multi-gigabit speed using advanced clock/signal

encoding (requires lots of circuitry at each end)– Lower power since only one well-behaved load– Multiple simultaneous transfers– Cheap cables and connectors (trade greater endpoint transistor cost for lower

physical wiring cost), customize bandwidth per device with multiple links– Examples: Ethernet, Infiniband, PCI Express, SATA, USB, Firewire, etc.

CPU I/O IF I/O 1 I/O 2

Central Bus Arbiter

CPU I/O IF

I/O 1

I/O 2

VS

Page 15: CS252 Graduate Computer Architecture Lecture 18 I/O, Buses, Queuing Theory John Kubiatowicz Electrical Engineering and Computer Sciences University of

4/6/2009 cs252-S09, Lecture 18 15

Main components of Intel Chipset: Pentium 4

• Northbridge:– Handles memory– Graphics

• Southbridge: I/O– PCI bus– Disk controllers– USB controllers– Audio– Serial I/O– Interrupt controller– Timers

Page 16: CS252 Graduate Computer Architecture Lecture 18 I/O, Buses, Queuing Theory John Kubiatowicz Electrical Engineering and Computer Sciences University of

4/6/2009 cs252-S09, Lecture 18 16

DeviceController

readwrite

controlstatus

AddressableMemoryand/orQueuesRegisters

(port 0x20)

HardwareController

Memory MappedRegion: 0x8f008020

BusInterface

How does the processor actually talk to the device?

• CPU interacts with a Controller– Contains a set of registers that

can be read and written– May contain memory for request

queues or bit-mapped images • Regardless of the complexity of the connections and

buses, processor accesses registers in two ways: – I/O instructions: in/out instructions

» Example from the Intel architecture: out 0x21,AL– Memory mapped I/O: load/store instructions

» Registers/memory appear in physical address space» I/O accomplished with load and store instructions

Address+Data

Interrupt Request

Processor Memory Bus

CPURegularMemory

InterruptController

BusAdaptor

BusAdaptor

Other Devicesor Buses

Page 17: CS252 Graduate Computer Architecture Lecture 18 I/O, Buses, Queuing Theory John Kubiatowicz Electrical Engineering and Computer Sciences University of

4/6/2009 cs252-S09, Lecture 18 17

Example: Memory-Mapped Display Controller• Memory-Mapped:

– Hardware maps control registers and display memory into physical address space

» Addresses set by hardware jumpers or programming at boot time

– Simply writing to display memory (also called the “frame buffer”) changes image on screen

» Addr: 0x8000F000—0x8000FFFF– Writing graphics description to

command-queue area » Say enter a set of triangles that describe

some scene» Addr: 0x80010000—0x8001FFFF

– Writing to the command register may cause on-board graphics hardware to do something

» Say render the above scene» Addr: 0x0007F004

• Can protect with page tables

DisplayMemory

0x8000F000

0x80010000

Physical AddressSpace

Status0x0007F000Command0x0007F004

GraphicsCommand

Queue

0x80020000

Page 18: CS252 Graduate Computer Architecture Lecture 18 I/O, Buses, Queuing Theory John Kubiatowicz Electrical Engineering and Computer Sciences University of

4/6/2009 cs252-S09, Lecture 18 18

Hard Disk Drives

IBM/Hitachi MicrodriveWestern Digital Drive

http://www.storagereview.com/guide/

Read/Write HeadSide View

Page 19: CS252 Graduate Computer Architecture Lecture 18 I/O, Buses, Queuing Theory John Kubiatowicz Electrical Engineering and Computer Sciences University of

4/6/2009 cs252-S09, Lecture 18 19

Historical Perspective• 1956 IBM Ramac — early 1970s Winchester

– Developed for mainframe computers, proprietary interfaces– Steady shrink in form factor: 27 in. to 14 in.

• Form factor and capacity drives market more than performance• 1970s developments

– 5.25 inch floppy disk formfactor (microcode into mainframe)– Emergence of industry standard disk interfaces

• Early 1980s: PCs and first generation workstations• Mid 1980s: Client/server computing

– Centralized storage on file server» accelerates disk downsizing: 8 inch to 5.25

– Mass market disk drives become a reality» industry standards: SCSI, IPI, IDE» 5.25 inch to 3.5 inch drives for PCs, End of proprietary interfaces

• 1900s: Laptops => 2.5 inch drives• 2000s: Shift to perpendicular recording

– 2007: Seagate introduces 1TB drive– 2009: Seagate/WD promises 2TB drive

Page 20: CS252 Graduate Computer Architecture Lecture 18 I/O, Buses, Queuing Theory John Kubiatowicz Electrical Engineering and Computer Sciences University of

4/6/2009 cs252-S09, Lecture 18 20

Disk History

Data densityMbit/sq. in.

Capacity ofUnit ShownMegabytes

1973:1. 7 Mbit/sq. in140 MBytes

1979:7. 7 Mbit/sq. in2,300 MBytes

source: New York Times, 2/23/98, page C3, “Makers of disk drives crowd even mroe data into even smaller spaces”

Page 21: CS252 Graduate Computer Architecture Lecture 18 I/O, Buses, Queuing Theory John Kubiatowicz Electrical Engineering and Computer Sciences University of

4/6/2009 cs252-S09, Lecture 18 21

Disk History

1989:63 Mbit/sq. in60,000 MBytes

1997:1450 Mbit/sq. in2300 MBytes

source: New York Times, 2/23/98, page C3, “Makers of disk drives crowd even mroe data into even smaller spaces”

1997:3090 Mbit/sq. in8100 MBytes

Page 22: CS252 Graduate Computer Architecture Lecture 18 I/O, Buses, Queuing Theory John Kubiatowicz Electrical Engineering and Computer Sciences University of

4/6/2009 cs252-S09, Lecture 18 22

Seagate Barracuda

• 2TB! 400 GB/in2

• 4 platters, 2 heads each• 3.5” platters • Perpendicular recording• 7200 RPM• 4.2ms latency (?)• 100MB/Sec transfer speed• 32MB cache

Page 23: CS252 Graduate Computer Architecture Lecture 18 I/O, Buses, Queuing Theory John Kubiatowicz Electrical Engineering and Computer Sciences University of

4/6/2009 cs252-S09, Lecture 18 23

Properties of a Hard Magnetic Disk

• Properties– Independently addressable element: sector

» OS always transfers groups of sectors together—”blocks”– A disk can access directly any given block of information it

contains (random access). Can access any file either sequentially or randomly.

– A disk can be rewritten in place: it is possible to read/modify/write a block from the disk

• Typical numbers (depending on the disk size):– 500 to more than 20,000 tracks per surface– 32 to 800 sectors per track

» A sector is the smallest unit that can be read or written• Zoned bit recording

– Constant bit density: more sectors on outer tracks– Speed varies with track location

Track

Sector

Platters

Page 24: CS252 Graduate Computer Architecture Lecture 18 I/O, Buses, Queuing Theory John Kubiatowicz Electrical Engineering and Computer Sciences University of

4/6/2009 cs252-S09, Lecture 18 24

MBits per square inch: DRAM as % of Disk over time

0%10%20%30%40%50%

1974 1980 1986 1992 1998

source: New York Times, 2/23/98, page C3, “Makers of disk drives crowd even mroe data into even smaller spaces”

470 v. 3000 Mb/si

9 v. 22 Mb/si

0.2 v. 1.7 Mb/si

Page 25: CS252 Graduate Computer Architecture Lecture 18 I/O, Buses, Queuing Theory John Kubiatowicz Electrical Engineering and Computer Sciences University of

4/6/2009 cs252-S09, Lecture 18 25

Nano-layered Disk Heads• Special sensitivity of Disk head comes from “Giant

Magneto-Resistive effect” or (GMR) • IBM is (was) leader in this technology

–Same technology as TMJ-RAM breakthrough

Coil for writing

Page 26: CS252 Graduate Computer Architecture Lecture 18 I/O, Buses, Queuing Theory John Kubiatowicz Electrical Engineering and Computer Sciences University of

4/6/2009 cs252-S09, Lecture 18 26

Disk Figure of Merit: Areal Density• Bits recorded along a track

– Metric is Bits Per Inch (BPI)• Number of tracks per surface

– Metric is Tracks Per Inch (TPI)• Disk Designs Brag about bit density per unit area

– Metric is Bits Per Square Inch: Areal Density = BPI x TPIYear Areal Density1973 2 1979 8 1989 63 1997 3,090 2000 17,100 2006 130,000 2007 164,0002009 400,000

1

10

100

1,000

10,000

100,000

1,000,000

1970 1980 1990 2000 2010

Year

Are

al D

ensi

ty

Page 27: CS252 Graduate Computer Architecture Lecture 18 I/O, Buses, Queuing Theory John Kubiatowicz Electrical Engineering and Computer Sciences University of

4/6/2009 cs252-S09, Lecture 18 27

Newest technology: Perpendicular Recording

• In Perpendicular recording:– Bit densities much higher– Magnetic material placed on top of magnetic underlayer that reflects

recording head and effectively doubles recording field

Page 28: CS252 Graduate Computer Architecture Lecture 18 I/O, Buses, Queuing Theory John Kubiatowicz Electrical Engineering and Computer Sciences University of

4/6/2009 cs252-S09, Lecture 18 28

Disk I/O Performance

Response Time = Queue+Disk Service Time

UserThread

Queue[OS Paths]

Controller

Disk

• Performance of disk drive/file system– Metrics: Response Time, Throughput– Contributing factors to latency:

» Software paths (can be loosely modeled by a queue)» Hardware controller» Physical disk media

• Queuing behavior:– Can lead to big increase of latency as utilization approaches 100%

100%

ResponseTime (ms)

Throughput (Utilization)(% total BW)

0

100

200

300

0%

Page 29: CS252 Graduate Computer Architecture Lecture 18 I/O, Buses, Queuing Theory John Kubiatowicz Electrical Engineering and Computer Sciences University of

4/6/2009 cs252-S09, Lecture 18 29

Magnetic Disk Characteristic• Cylinder: all the tracks under the

head at a given point on all surface• Read/write data is a three-stage

process:– Seek time: position the head/arm over the proper track (into proper

cylinder)– Rotational latency: wait for the desired sector

to rotate under the read/write head– Transfer time: transfer a block of bits (sector)

under the read-write head• Disk Latency = Queueing Time + Controller time +

Seek Time + Rotation Time + Xfer Time

• Highest Bandwidth: – transfer large group of blocks sequentially from one track

SectorTrack

CylinderHead

Platter

SoftwareQueue

(Device Driver)

Hardw

areController

Media Time(Seek+Rot+Xfer)

Request

Result

Page 30: CS252 Graduate Computer Architecture Lecture 18 I/O, Buses, Queuing Theory John Kubiatowicz Electrical Engineering and Computer Sciences University of

4/6/2009 cs252-S09, Lecture 18 30

Disk Time Example• Disk Parameters:

– Transfer size is 8K bytes– Advertised average seek is 12 ms– Disk spins at 7200 RPM– Transfer rate is 4 MB/sec

• Controller overhead is 2 ms• Assume that disk is idle so no queuing delay• Disk Latency =

Queuing Time + Seek Time + Rotation Time + Xfer Time + Ctrl Time

• What is Average Disk Access Time for a Sector?– Ave seek + ave rot delay + transfer time + controller overhead– 12 ms + [0.5/(7200 RPM/60s/M)] 1000 ms/s +

[8192 bytes/(4106 bytes/s)] 1000 ms/s + 2 ms– 12 + 4.17 + 2.05 + 2 = 20.22 ms

• Advertised seek time assumes no locality: typically 1/4 to 1/3 advertised seek time: 12 ms => 4 ms

Page 31: CS252 Graduate Computer Architecture Lecture 18 I/O, Buses, Queuing Theory John Kubiatowicz Electrical Engineering and Computer Sciences University of

4/6/2009 cs252-S09, Lecture 18 31

Typical Numbers of a Magnetic Disk• Average seek time as reported by the industry:

– Typically in the range of 4 ms to 12 ms– Due to locality of disk reference may only be 25% to 33% of the

advertised number• Rotational Latency:

– Most disks rotate at 3,600 to 7200 RPM (Up to 15,000RPM or more)– Approximately 16 ms to 8 ms per revolution, respectively– An average latency to the desired information is halfway around

the disk: 8 ms at 3600 RPM, 4 ms at 7200 RPM• Transfer Time is a function of:

– Transfer size (usually a sector): 1 KB / sector– Rotation speed: 3600 RPM to 15000 RPM– Recording density: bits per inch on a track– Diameter: ranges from 1 in to 5.25 in– Typical values: 2 to 50 MB per second

• Controller time?– Depends on controller hardware—need to examine each case

individually

Page 32: CS252 Graduate Computer Architecture Lecture 18 I/O, Buses, Queuing Theory John Kubiatowicz Electrical Engineering and Computer Sciences University of

4/6/2009 cs252-S09, Lecture 18 32

DeparturesArrivalsQueuing System

Introduction to Queuing Theory

• What about queuing time??– Let’s apply some queuing theory– Queuing Theory applies to long term, steady state behavior

Arrival rate = Departure rate• Little’s Law:

Mean # tasks in system = arrival rate x mean response time– Observed by many, Little was first to prove– Simple interpretation: you should see the same number of tasks

in queue when entering as when leaving.• Applies to any system in equilibrium, as long as nothing

in black box is creating or destroying tasks– Typical queuing theory doesn’t deal with transient behavior, only

steady-state behavior

Queue

Controller

Disk

Page 33: CS252 Graduate Computer Architecture Lecture 18 I/O, Buses, Queuing Theory John Kubiatowicz Electrical Engineering and Computer Sciences University of

4/6/2009 cs252-S09, Lecture 18 33

Background: Use of random distributions• Server spends variable time with customers

– Mean (Average) m1 = p(T)T– Variance 2 = p(T)(T-m1)2 = p(T)T2-m1– Squared coefficient of variance: C = 2/m12

Aggregate description of the distribution.• Important values of C:

– No variance or deterministic C=0 – “memoryless” or exponential C=1

» Past tells nothing about future» Many complex systems (or aggregates)

well described as memoryless – Disk response times C 1.5 (majority seeks < avg)

• Mean Residual Wait Time, m1(z):– Mean time must wait for server to complete current task– Can derive m1(z) = ½m1(1 + C)

» Not just ½m1 because doesn’t capture variance– C = 0 m1(z) = ½m1; C = 1 m1(z) = m1

Mean (m1)

mean

Memoryless

Distributionof service times

Page 34: CS252 Graduate Computer Architecture Lecture 18 I/O, Buses, Queuing Theory John Kubiatowicz Electrical Engineering and Computer Sciences University of

4/6/2009 cs252-S09, Lecture 18 34

A Little Queuing Theory: Mean Wait Time

• Parameters that describe our system: : mean number of arriving customers/second– Tser: mean time to service a customer (“m1”)– C: squared coefficient of variance = 2/m12

– μ: service rate = 1/Tser– u: server utilization (0u1): u = /μ = Tser

• Parameters we wish to compute:– Tq: Time spent in queue– Lq: Length of queue = Tq (by Little’s law)

• Basic Approach:– Customers before us must finish; mean time = Lq Tser– If something at server, takes m1(z) to complete on avg

» Chance server busy = u mean time is u m1(z)• Computation of wait time in queue (Tq):

– Tq = Lq Tser + u m1(z)

Arrival Rate

Queue ServerService Rateμ=1/Tser

Page 35: CS252 Graduate Computer Architecture Lecture 18 I/O, Buses, Queuing Theory John Kubiatowicz Electrical Engineering and Computer Sciences University of

4/6/2009 cs252-S09, Lecture 18 35

Mean Residual Wait Time: m1(z)

• Imagine n samples– There are n P(Tx) samples of size Tx

– Total space of samples of size Tx: – Total time for n services:– Chance arrive in service of length Tx:

– Avg remaining time if land in Tx: ½Tx

– Finally: Average Residual Time m1(z):

)()( xxxx TPTnTPnT

T1 T2 T3 Tn…

Random Arrival Point

Total time for n services

serx xx TnTPTn )(

ser

xx

ser

xx

TTPT

TnTPTn )()(

CTTTT

TTE

TTPTT ser

ser

serser

ser

x

x ser

xxx

1

21

21)(

21)(

21

2

222

Page 36: CS252 Graduate Computer Architecture Lecture 18 I/O, Buses, Queuing Theory John Kubiatowicz Electrical Engineering and Computer Sciences University of

4/6/2009 cs252-S09, Lecture 18 36

A Little Queuing Theory: M/G/1 and M/M/1• Computation of wait time in queue (Tq):

Tq = Lq Tser + u m1(z) Tq = Tq Tser + u m1(z) Tq = u Tq + u m1(z)Tq (1 – u) = m1(z) u Tq = m1(z) u/(1-u) Tq = Tser ½(1+C) u/(1 – u)

• Notice that as u1, Tq !• Assumptions so far:

– System in equilibrium; No limit to the queue: works First-In-First-Out

– Time between two successive arrivals in line are random and memoryless: (M for C=1 exponentially random)

– Server can start on next customer immediately after prior finishes

• General service distribution (no restrictions), 1 server:– Called M/G/1 queue: Tq = Tser x ½(1+C) x u/(1 – u))

• Memoryless service distribution (C = 1):– Called M/M/1 queue: Tq = Tser x u/(1 – u)

Little’s LawDefn of utilization (u)

Page 37: CS252 Graduate Computer Architecture Lecture 18 I/O, Buses, Queuing Theory John Kubiatowicz Electrical Engineering and Computer Sciences University of

4/6/2009 cs252-S09, Lecture 18 37

A Little Queuing Theory: An Example• Example Usage Statistics:

– User requests 10 x 8KB disk I/Os per second– Requests & service exponentially distributed (C=1.0)– Avg. service = 20 ms (From controller+seek+rot+trans)

• Questions: – How utilized is the disk?

» Ans: server utilization, u = Tser– What is the average time spent in the queue?

» Ans: Tq– What is the number of requests in the queue?

» Ans: Lq– What is the avg response time for disk request?

» Ans: Tsys = Tq + Tser• Computation:

(avg # arriving customers/s) = 10/sTser (avg time to service customer) = 20 ms (0.02s)u (server utilization) = x Tser= 10/s x .02s = 0.2Tq (avg time/customer in queue) = Tser x u/(1 – u) = 20 x 0.2/(1-0.2) = 20 x 0.25 = 5 ms (0 .005s)Lq (avg length of queue) = x Tq=10/s x .005s = 0.05Tsys (avg time/customer in system) =Tq + Tser= 25 ms

Page 38: CS252 Graduate Computer Architecture Lecture 18 I/O, Buses, Queuing Theory John Kubiatowicz Electrical Engineering and Computer Sciences University of

4/6/2009 cs252-S09, Lecture 18 38

Use Arrays of Small Disks?

14”10”5.25”3.5”

3.5”

Disk Array: 1 disk design

Conventional: 4 disk designs

Low End High End

•Katz and Patterson asked in 1987: •Can smaller disks be used to close gap in performance between disks and CPUs?

Page 39: CS252 Graduate Computer Architecture Lecture 18 I/O, Buses, Queuing Theory John Kubiatowicz Electrical Engineering and Computer Sciences University of

4/6/2009 cs252-S09, Lecture 18 39

Array Reliability

• Reliability of N disks = Reliability of 1 Disk ÷ N

50,000 Hours ÷ 70 disks = 700 hours

Disk system MTTF: Drops from 6 years to 1 month!

• Arrays (without redundancy) too unreliable to be useful!

Hot spares support reconstruction in parallel with access: very high media availability can be achieved

Page 40: CS252 Graduate Computer Architecture Lecture 18 I/O, Buses, Queuing Theory John Kubiatowicz Electrical Engineering and Computer Sciences University of

4/6/2009 cs252-S09, Lecture 18 40

Redundant Arrays of Disks• Files are "striped" across multiple spindles• Redundancy yields high data availability

Disks will fail

Contents reconstructed from data redundantly stored in the arrayCapacity penalty to store it

Bandwidth penalty to update

Mirroring/Shadowing (high capacity cost)

Horizontal Hamming Codes (overkill)

Parity & Reed-Solomon Codes

Failure Prediction (no capacity overhead!)VaxSimPlus — Technique is controversial

Techniques:

Page 41: CS252 Graduate Computer Architecture Lecture 18 I/O, Buses, Queuing Theory John Kubiatowicz Electrical Engineering and Computer Sciences University of

4/6/2009 cs252-S09, Lecture 18 41

Redundant Arrays of DisksRAID 1: Disk Mirroring/Shadowing

• Each disk is fully duplicated onto its "shadow" Very high availability can be achieved

• Bandwidth sacrifice on write: Logical write = two physical writes

• Reads may be optimized

• Most expensive solution: 100% capacity overheadTargeted for high I/O rate , high availability environments

recoverygroup

Page 42: CS252 Graduate Computer Architecture Lecture 18 I/O, Buses, Queuing Theory John Kubiatowicz Electrical Engineering and Computer Sciences University of

4/6/2009 cs252-S09, Lecture 18 42

Redundant Arrays of Disks RAID 5+: High I/O Rate Parity

A logical writebecomes fourphysical I/Os

Independent writespossible because ofinterleaved parity

Reed-SolomonCodes ("Q") forprotection duringreconstruction

D0 D1 D2 D3 P

D4 D5 D6 P D7

D8 D9 P D10 D11

D12 P D13 D14 D15

P D16 D17 D18 D19

D20 D21 D22 D23 P...

.

.

.

.

.

.

.

.

.

.

.

.Disk Columns

IncreasingLogical

Disk Addresses

Stripe

StripeUnit

Targeted for mixedapplications

Page 43: CS252 Graduate Computer Architecture Lecture 18 I/O, Buses, Queuing Theory John Kubiatowicz Electrical Engineering and Computer Sciences University of

4/6/2009 cs252-S09, Lecture 18 43

Problems of Disk Arrays: Small Writes

D0 D1 D2 D3 PD0'

+

+

D0' D1 D2 D3 P'

newdata

olddata

old parity

XOR

XOR

(1. Read) (2. Read)

(3. Write) (4. Write)

RAID-5: Small Write Algorithm1 Logical Write = 2 Physical Reads + 2 Physical Writes

Page 44: CS252 Graduate Computer Architecture Lecture 18 I/O, Buses, Queuing Theory John Kubiatowicz Electrical Engineering and Computer Sciences University of

4/6/2009 cs252-S09, Lecture 18 44

System Availability: Orthogonal RAIDs

ArrayController

StringController

StringController

StringController

StringController

StringController

StringController

. . .

. . .

. . .

. . .

. . .

. . .

Data Recovery Group: unit of data redundancyRedundant Support Components: fans, power supplies, controller, cablesEnd to End Data Integrity: internal parity protected data paths

Page 45: CS252 Graduate Computer Architecture Lecture 18 I/O, Buses, Queuing Theory John Kubiatowicz Electrical Engineering and Computer Sciences University of

4/6/2009 cs252-S09, Lecture 18 45

System-Level AvailabilityFully dual redundantI/O Controller I/O Controller

Array Controller Array Controller

. . .

. . .

. . .

. . . . . .

.

.

.RecoveryGroup

Goal: No SinglePoints ofFailure

host host

with duplicated paths, higher performance can beobtained when there are no failures

Page 46: CS252 Graduate Computer Architecture Lecture 18 I/O, Buses, Queuing Theory John Kubiatowicz Electrical Engineering and Computer Sciences University of

OceanStore:Global Scale Persistent Storage

Global-Scale Persistent Storage

Page 47: CS252 Graduate Computer Architecture Lecture 18 I/O, Buses, Queuing Theory John Kubiatowicz Electrical Engineering and Computer Sciences University of

4/6/2009 cs252-S09, Lecture 18 47

Pac Bell

Sprint

IBMAT&T

CanadianOceanStore

• Service provided by confederation of companies– Monthly fee paid to one service provider– Companies buy and sell capacity from each other

IBM

Utility-based Infrastructure

Page 48: CS252 Graduate Computer Architecture Lecture 18 I/O, Buses, Queuing Theory John Kubiatowicz Electrical Engineering and Computer Sciences University of

4/6/2009 cs252-S09, Lecture 18 48

Important P2P Technology(Decentralized Object Location and Routing)

GUID1

DOLR

GUID1GUID2

Page 49: CS252 Graduate Computer Architecture Lecture 18 I/O, Buses, Queuing Theory John Kubiatowicz Electrical Engineering and Computer Sciences University of

4/6/2009 cs252-S09, Lecture 18 49

Peer-to-peer systems can be very stable

(May 2003: 1.5 TB over 4 hours)In JSAC, To appear

Page 50: CS252 Graduate Computer Architecture Lecture 18 I/O, Buses, Queuing Theory John Kubiatowicz Electrical Engineering and Computer Sciences University of

4/6/2009 cs252-S09, Lecture 18 50

The Path of an OceanStore UpdateSecond-Tier

Caches

Multicasttrees

Inner-RingServers

Clients

Page 51: CS252 Graduate Computer Architecture Lecture 18 I/O, Buses, Queuing Theory John Kubiatowicz Electrical Engineering and Computer Sciences University of

4/6/2009 cs252-S09, Lecture 18 51

Archival Disseminationof Fragments

Page 52: CS252 Graduate Computer Architecture Lecture 18 I/O, Buses, Queuing Theory John Kubiatowicz Electrical Engineering and Computer Sciences University of

4/6/2009 cs252-S09, Lecture 18 52

Aside: Why erasure coding?High Durability/overhead ratio!

• Exploit law of large numbers for durability!• 6 month repair, FBLPY:

– Replication: 0.03– Fragmentation: 10-35

Fraction Blocks Lost Per Year (FBLPY)

Page 53: CS252 Graduate Computer Architecture Lecture 18 I/O, Buses, Queuing Theory John Kubiatowicz Electrical Engineering and Computer Sciences University of

4/6/2009 cs252-S09, Lecture 18 53

Conclusion• Disk industry growing rapidly, improves:

– bandwidth 40%/yr , – areal density 60%/year, $/MB faster?

• Disk Time = queue + controller + seek + rotate + transfer• Advertised average seek time benchmark much greater

than average seek time in practice• Redundancy useful to gain reliability

– Redundant disks+controllers+etc (RAID)– Geographical scale systems (OceanStore)

• Queueing theory: for (c=1):

u

uxCW

1

121

u

uxW1