topics in computer system performance and reliability: storage systems!bianca/lec1.pdf · 2019. 1....

Post on 03-Oct-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

CSC 2233:

Topics in Computer System Performance and Reliability: Storage Systems!

Note: some of the slides in today’s lecture are borrowed from a course taughtby Greg Ganger and Garth Gibson at Carnegie Mellon University

2

Who am I?

3

What makes storage systems so cool?

1. Combines so many topic areas:■ hardware meets OS meets networking meets distributed systems

meets security meets AI meets HCI…

4

What makes storage systems so cool?

1. Combines so many topic areas2. This is where great jobs are!

■ Designers and implementers still needed● not just testing J

■ Continuing growth area for the future● The Internet is a network, but the web is a storage system● Strong existing companies: EMC, NetApp, …● Core competency for Internet services: Google, Microsoft, Amazon, …● and still support for start-ups

5

What makes storage systems so cool?

1. Combines so many topic areas2. Great careers3. Still so much room to contribute:

■ performance actually matters here● in fact, it dominates other parts of system performance in many cases

■ … and reliability too■ storage management wide open■ and, storage starting to “take over” computation

6

Amdahl’s Law

◆ Speedup limited to fraction improved■ obvious, but fundamental, observation

50

50

90% reduction in BLUEyields only

45% reduction in total

◆ What does this mean for storage systems?

50

5

7

Technology Trends

2000 2002 2004 2006 2008 2010Year

Nor

mal

ized

val

ue re

lativ

e to

200

0

1

10

100CPU Performance

Memory BandwidthDisk Bandwidth

Network Bandwidth

Network LatencyDisk Latency

8

Consequence: storage performance dominates

• Assume 50 seconds CPU & 50 seconds I/O• CPU improves by 2X every 2 years

9

Consequence: storage performance dominates

• Assume 50 seconds CPU & 50 seconds I/O• CPU improves by 2X every 2 years

10

“I/O certainly has been lagging in the last decade”● Seymour Cray, 1976

“Also, I/O needs a lot of work”● David Kuck, 1988

“In 3 to 5 years, we will start seeing servers as peripherals to storage”

● SUN Chief Technology Officer, 1998

“Scalable I/O is perhaps the most overlooked area of high-performance computing R&D”

● Suggested R&D topic report for 2005-2009

Storage systems: fun quotes

11

Remainder of the course

◆ Devices:■ Hard disk drives, solid state drives

◆ Local file systems■ File system organizations■ File system integrity/consistency

◆ NVM file systems◆ Distributed file systems◆ Parallel file systems◆ Extremely scalable storage (Google & Co)◆ Reliability & fault tolerance

12

Logistics & Administratives

◆ Class time: Wed 10am – 12pm◆ Office hours:

■ By appointment◆ Class web page

■ www.cs.toronto.edu/~bianca/csc2233.html■ Still undergoing updates

◆ 11 weeks of lectures◆ Course project due end of the semester

13

Grading

◆ 30% class participation■ Participation in class discussions

● (Read all papers prior to class)■ Class presentation of research paper■ Possibly quizzes (10%)

◆ 70% class project◆ No exams, no homework, no paper summaries

14

Class project

◆ Can be done in team of two or alone■ Start looking for a partner now!

◆ We will suggest possible projects (see course web page)◆ Output: workshop quality research paper (10-12 pages)

■ Even better: conference quality paper■ Use latex template on course web page■ All reports will be published as tech-report

◆ We will help you get there --- multiple milestones:◆ And meetings with TA/instructor

15

Paper presentation

◆ Each of you will present at least one paper in class◆ Format of the presentation:

■ 25 min presentation of paper contents■ 5-15 min paper review

● Good points● Bad points

■ 10 min class discussion that you lead!● Prepare questions!

◆ Keep the class engaged!

16

Paper presentation

◆ What I do not want:■ A long laundry list of all things the paper did

◆ What I do want:■ A lecture style presentation of the paper

● Including background material your fellow class mates might need to understand the paper

■ A critical discussion of the paper● Strength & Weaknesses● Prepare questions!

◆ What you get:■ Feedback!

17

Purpose of presentation

◆ Wrong answers:■ “To give a verbal version of the paper, cramming all its content into

30 min”■ “To impress people with your technical depth and thoroughness”

◆ In fact, no one cares about these things■ The goal is to filter out the main points of the paper and present

them well■ By the end, everybody in the audience should remember 2-3 take-

home messages

18

What’s on each slide?

◆ Control level of detail

◆ Each slide should have one basic point◆ There should NOT be tons of text◆ Use sentence fragments◆ Use pictures everywhere you possibly can!

■ A picture says more than 1000 words■ Saves text and thus slides■ Much easier to process

19

Rest of today: Some review …

20

What are storage systems all about?

◆ Memory/storage hierarchy

21

Memory/storage hierarchies

◆ Balancing performance with cost■ Small memories are fast but expensive■ Large memories are slow but cheap

◆ Exploit locality to get the best of both worlds■ locality = re-use/nearness of accesses■ allows most accesses to use small, fast memory

Capacity

Performance

L1/2CACHE

L3CACHE

DRAM

SSD

HARD DISK

22

Example memory hierarchy values

Notice the huge access time gap

between DRAM and disk

SSDs(tens of microsecs)

23

What are storage systems all about?

◆ Memory/storage hierarchy■ Combining many technologies to balance costs/benefits■ For long time not the focal point of storage system design

● More interesting in recent years with SSDs and NVMs arriving on the market

24

What are storage systems all about?

◆ Memory/storage hierarchy■ Combining many technologies to balance costs/benefits■ For long time not the focal point of storage system design

● More interesting in recent years with SSDs and NVMs arriving on the market

◆ Persistence■ Storing data for lengthy periods of time■ To be useful, it must also be possible to find it again later

● this brings in data organization, consistency, and management issues■ This is where the serious action is

● and it does relate to the memory/storage hierarchy

25

Why persistence is important

◆ Some statistics:■ Among companies who lose data in a disaster, 50% never re-open

and 90% are out of business within two years■ Even smaller incidents can be costly

● Reproducing some tens of megabytes of accounting data can take several weeks and cost tens of thousands of dollars

■ Bad PR!

26

Storage System

Application

Bob1Bob2Bob3Bob4

Bob1

Bob2Bob3Bob4

Bob2

Bob3Bob4

Bob3

Bob4

Bob4

Application gives data objects & their

IDs to storage

What is a storage system: Big Picture

The storage systemkeeps the data objectsand returns one upon

request (by ID)Bob2

Bob1

Bob3Bob4

27

Storage Systems & Interfaces

◆ What is a “Storage System”?■ Hardware (devices, controllers, interconnect) and Software (file

system, device drivers, firmware) dedicated to providing management of and access to persistent storage.

◆ One view: defined by collection of interfaces

28

Program PhysicalMedia

Filesystem

Devicedriver

I/O controller

High level of abstraction No abstraction

Storage Software Interfaces

Understands files and directories

HDD understands platters, cylinders, tracks, sectors

29

What’s inside a disk?

30

Disk structure – top view of single platter

◆ Surface organized into tracks◆ Tracks organized into sectors

31

Disk service time components

◆ Components:■ Seek■ Rotational latency■ Data transfer

After BLUE read Seek for RED Rotational latency After RED read

32

Seek time

◆ Time required to move head over desired track◆ A real seek profile:

◆ Note that this is not linear!

33

Seek time

◆ Seek times not linear because they have up to four components:■ Accelerate■ Coast at max velocity

● If going far enough to reach max velocity■ Decelerate■ Settle onto correct trace

● Takes extra time to settle before writing

34

What is the average seek time?

◆ Watch out for misrepresentations

◆ What it is not:■ Seek time for average of possible distances■ Seek time for any LBN to any other

◆ What it is:■ Depends on workload■ Very different for sequential versus random workloads

35

Where does the disk head’s time go?

◆ Seek time, rotational latency, transfer time?

Random 4KB requests

36

Impact of request sizes?

◆ Seek time, rotational latency, transfer time?

37

Impact of locality?

◆ Seek time, rotational latency, transfer time?

38

Where does the disk head’s time go?

◆ Seek time: 1– 6ms, depending on distance■ Improving at 7-10% per year

◆ Rotation speeds: 7,200-15,000 RPM■ Average latency of 2-4ms■ Improving at 7-10% per year

◆ Data rates: 60-100 MB/s■ Average sector transfer time of 25us■ Improving at 30-40% per year

39

What’s inside a disk?

◆ The mechanics:

◆ The electronics (just like a small computer):■ A processor■ DRAM ■ Control ASIC

What are all those needed for?

40

Disk drive – what’s in a sector?

◆ Data■ Typically 512 bytes

◆ Sync bytes = pattern to notify controller that data follows◆ Header (ID information)

■ Cylinder, head, and sector number◆ ECC (error correcting codes)

■ At such high densities, problems occur■ ECC detects and corrects on the fly■ “Tri-state guarantee” of sector writes

● All written● All not written● Sector destroyed● NEVER: partially modified

● Servo = bit pattern used for centering on track

41

How is functionality implemented?

◆ Some in ASIC logic:■ Error detection and correction■ Servo processing■ Motor-seek control

◆ Some in firmware running on control processor

42

How is functionality implemented?

◆ Some in ASIC logic:■ Error detection and correction■ Servo processing■ Motor-seek control

◆ Some in firmware running on control processor■ Request processing, queueing, scheduling■ LBN to PBN mapping

43

How to map LBN to PBN

65 7 12 23……◆ The view of the OS

◆ The reality:

44

LBN to physical mapping for single surface

45

Extending the mapping to a multi-surface disk

46

First complication: zones

Real disks don’t have constant number of sectors per track

47

Multiple “zones”

48

Computing physical location from LBN

……. …….

An example zone breakdown

◆ First, figure out which zone contains the LBN■ i.e. which cylinder

◆ Then determine surface number◆ Then determine sector number

49

Second complication: defect management

◆ Disks keep spare sectors◆ Those are used in case portions of the media become

unusable (both before and after installation)

50

Second complication: defect management

◆ First approach: remap broken sector, don’t touch anything else

51

Second complication: defect management

◆ Second approach, “slip” mapping past broken sector

52

Third complication: skew

53

Third complication: skew

54

Third complication: skew

◆ It takes time to switch from one track to another■ Sequential transfers suffer full rotation

55

Same request with track skew of one sector

◆ Track skew prevents unnecessary rotation

56

Same request with track skew of one sector

◆ Track skew prevents unnecessary rotation

57

How is functionality implemented?

◆ Some in ASIC logic:■ Error detection and correction■ Servo processing■ Motor-seek control

◆ Some in firmware running on control processor■ Request processing, queueing, scheduling■ LBN to PBN mapping

● Zones● Defects● Skew

top related