solving io bottleneck

20
IMEX RESEARCH.COM © 201011 IMEX Research, Copying prohibited. All rights reserved. Are SSDs Ready for Enterprise Storage Systems Anil Vasudeva, President & Chief Analyst, IMEX Research Solving the IO Bottleneck in NextGen DataCenters & Cloud Computing Anil Vasudeva President & Chief Analyst [email protected] 408-268-0800 © 2007-11 IMEX Research All Rights Reserved Copying Prohibited Please write to IMEX for authorization IMEX RESEARCH.COM

Upload: anil-vasudeva

Post on 12-Jun-2015

855 views

Category:

Technology


0 download

DESCRIPTION

Solving IO BottleneckIMEX Research

TRANSCRIPT

Page 1: Solving io bottleneck

IMEXRESEARCH.COM

© 2010‐11  IMEX Research, Copying prohibited. All rights reserved.

Are SSDs Ready for Enterprise Storage Systems

Anil Vasudeva, President & Chief Analyst, IMEX Research

Solving the IO Bottleneck in NextGen DataCenters &

Cloud Computing

Anil VasudevaPresident & Chief [email protected]

© 2007-11 IMEX ResearchAll Rights Reserved

Copying ProhibitedPlease write to IMEX for authorization

IMEXRESEARCH.COM

Page 2: Solving io bottleneck

IMEXRESEARCH.COM

© 2010‐11  IMEX Research, Copying prohibited. All rights reserved.

22

Abstract

Solving the I/O Bottleneck in NextGen Data Centers & Cloud Computing

Virtualization brings tremendous advantages to data centers of all sizes – large and small. But the rapid proliferation of virtual machines (VMs) per physical server, in its wake, creates a highly randomized I/O problem, raising performance bottlenecks. To address this I/O issue, the IT industry and storage administrators in particular, have begun the adoption of a slew of newer technology solutions - ranging from faster and larger memories in cache, SSDs for higher IOPS cost effectively, high bandwidth (10GbE) networks using NPIVs for virtualized networks between VMs and shared storage systems along with embedded intelligence software to optimize various VM workloads – all in order to meet various SLA metrics of performance, availability, cost etc..Are you aware of the side-effects that get created from Server Virtualization and prepared to ask pertinent questions of your suppliers of IT Infrastructure Equipment, Storage Virtualization and Data Storage Management software as well as in their implementation to achieve targeted results in performance, availability, scalability, interoperability and data management in their virtualized data centersThis presentation provides an illustrative view of the impact of Server Virtualization on existing storage I/O solutions and best practices. It delineates the roles, capabilities and cost effectiveness of emerging technologies in mitigating the I/O bottlenecks so the IT infrastructure implementers can achieve their targeted performance under various workloads, from their storage systems in virtualized data centers.

Page 3: Solving io bottleneck

IMEXRESEARCH.COM

© 2010‐11  IMEX Research, Copying prohibited. All rights reserved.

33

Agenda

• Data Centers & Cloud Infrastructure• Cloud Computing Architecture• Performance Metrics by Workload• Anatomy of Data Access• Data Center Performance Bottlenecks• Improving Query Response Time in OLTP• Role of SSD in Improving I/O Perf. Gap• SCM: A New Storage Class Memory SSDs • Price Erosion & IOPS/GB • Choosing SSD vs. Memory to Improve TPS• New Storage Usage Hierarchy in NGDC & Clouds • IO Bottleneck Mitigation in Virtualized Servers• I/O Forensics for Auto Storage-Tiering• Apps Benefitting from Improved I/O• Key Takeaways• Acknowledgements

Page 4: Solving io bottleneck

IMEXRESEARCH.COM

© 2010‐11  IMEX Research, Copying prohibited. All rights reserved.

44

Data Centers & Cloud Infrastructure

Enterprise VZ Data CenterOn-Premise Cloud

Home Networks

Web 2.0Social Ntwks.

Facebook,Twitter, YouTube…

Cable/DSL…Cellular

Wireless

InternetISP

CoreOptical

Edge ISP

ISPISP

ISP

ISP

Supplier/Partners

Remote/Branch Office

Public CloudCenter©

Servers VPNIaaS, PaaS SaaS

Vertical Clouds

ISP

Tier-3Data Base

ServersTier-2 Apps

ManagementDirectory Security PolicyMiddleware Platform

Switches: Layer 4-7,Layer 2, 10GbE, FC Stg

Caching, Proxy, FW, SSL, IDS, DNS,

LB, Web Servers

Application ServersHA, File/Print, ERP, SCM, CRM Servers

Database Servers, Middleware, Data Mgmt

Tier-1 Edge Apps

FC/ IPSANs

Source:: IMEX Research - Cloud Infrastructure Report ©2009-11

Page 5: Solving io bottleneck

IMEXRESEARCH.COM

© 2010‐11  IMEX Research, Copying prohibited. All rights reserved.

55

IT Industry’s Journey - Roadmap

CloudizationCloudizationOn-Premises > Private Clouds > Public CloudsDC to Cloud-Aware Infrast. & Apps. Cascade migration to SPs/Public Clouds.

Integrate Physical Infrast./Blades to meet CAPSIMS ®IMEX

Cost, Availability, Performance, Scalability, Inter-operability, Manageability & Security

Integration/ConsolidationIntegration/Consolidation

Standard IT Infrastructure- Volume Economics HW/Syst SW(Servers, Storage, Networking Devices, System Software (OS, MW & Data Mgmt SW)

StandardizationStandardization

VirtualizationVirtualizationPools Resources. Provisions, Optimizes, MonitorsShuffles Resources to optimize Delivery of various Business Services

Automatically Maintains Application SLAs (Self-Configuration, Self-Healing©IMEX, Self-Acctg. Charges etc)

AutomationAutomationSIVAC SIVAC

®IMEX

Source:: IMEX Research - Cloud Infrastructure Report ©2009-11

Page 6: Solving io bottleneck

IMEXRESEARCH.COM

© 2010‐11  IMEX Research, Copying prohibited. All rights reserved.

Platform Tools & Services

Operating Systems

Cloud ComputingPublic Cloud Service

Providers Private Cloud

Enterprise

AppApp SLA

SLA

SaaS Applications

…...Net

Python

EJB

Ruby

PHP …..PaaS

IaaS

SaaS

Virtualization

Resources (Servers, Storage, Networks)

HybridCloud

AppApp SLA

SLA AppApp SLA

SLA AppApp SLA

SLA AppApp SLA

SLA

Man

agem

ent

Cloud Computing Architecture

Application’s SLA dictates the Resources Required to meet specific requirements of Availability, Performance, Cost, Security, Manageability etc.

Page 7: Solving io bottleneck

IMEXRESEARCH.COM

© 2010‐11  IMEX Research, Copying prohibited. All rights reserved.

*IOPS for a required response time ( ms)*=(#Channels*Latency-1)

(RAID - 0, 3)

500100MB/sec

101 505

Data Warehousing

OLAP

BusinessIntelligence(RAID - 1, 5, 6)

IOPS

* (*L

aten

cy-1

)

Web 2.0Web 2.0AudioAudio

VideoVideo

Scientific ComputingScientific Computing

ImagingImagingHPCHPC

TPHPCHPC

Performance Metrics by Workload

10K

100 K

1K

100

10

1000 K OLTP

eCommerceeCommerceTransaction Transaction ProcessingProcessing

Source:: IMEX Research - Cloud Infrastructure Report ©2009-11

Page 8: Solving io bottleneck

IMEXRESEARCH.COM

© 2010‐11  IMEX Research, Copying prohibited. All rights reserved.

Anatomy of Data Access

Disk I/O

Server

Processor

Perf

orm

ance

1980 1990 2000 2010

A 7.2K/15k rpm HDD can do 100/140 IOPS

For the time it takes to do each Disk Operation:- Millions of CPU Operations can be done- Hundreds of Thousands of Memory Operations can be accomplished

I/O Gap

Anatomy of Data Access

Time taken by CPU, Memory, Network, Disk

for a typical I/O Operation during a

Data Access

Page 9: Solving io bottleneck

IMEXRESEARCH.COM

© 2010‐11  IMEX Research, Copying prohibited. All rights reserved.

Data Center Performance Bottlenecks

ServersWeb Servers Application Servers Database Servers

ClientsWindows, Linux, Unix

StorageWeb, Application, Database

LAN Access Networks

Storage I/O Access

Device BottlenecksDevice I/O Hotspots Cache Flush Lack of Storage Capacity

Server BottlenecksLack of Srvr Power IO Wait & Queuing CPU Overhead I/O Timeouts

User BottlenecksConnectivity Timeouts, Workload Surges

Storage I/O ConnectLack of Bandwidth Overloaded PCIe Connect Storage Device Contention

ApplicationsExcessive Locking Data Contention I/O Delays/Errors

Network I/ONetwork Congestion Dropped packets Data Retransmissions Timeouts Component Failures

Page 10: Solving io bottleneck

IMEXRESEARCH.COM

© 2010‐11  IMEX Research, Copying prohibited. All rights reserved.

1010

• Improving Query Response Time• Cost effective way to improve Query response time for a given number

of users or servicing an increased number of users at a given response time is best served with use of SSDs or Hybrid (SSD + HDDs) approach, particularly for Database and Online Transaction Applications

HDDs14 Drives

HDDs 112 Drives

w short stroking SSDs

12 Drives$$$$$$$$

$$

$$$

IOPS (or Number of Concurrent Users)

Que

ry R

espo

nse

Tim

e (m

s)

0

8

2

4

6

0 20,000 40,00010,000 30,000

Conceptual Only -Not to Scale

Improving Query Response Time in OLTP

Hybrid HDDs

36 Drives + SSDs$$$$

Source: IMEX Research SSD Industry Report ©2011

Page 11: Solving io bottleneck

IMEXRESEARCH.COM

© 2010‐11  IMEX Research, Copying prohibited. All rights reserved.

1111

Role of SSD in Improving I/O Perf. Gap

Performance I/O Access Latency

Price$/GB

HDD

Tape

DRAM

CPUSDRAM

HDD becoming Cheaper, not faster

DRAM getting Faster (to feed faster CPUs) & Larger (to feed Multi-cores & Multi-VMs from Virtualization)

SCM

NOR

NANDPCIe SSD

SATASSD

SSD segmenting into PCIe SSD Cache- as backend to DRAM &SATA SSD- as front end to HDD

Source: IMEX Research SSD Industry Report ©2011

Servers

Hybr. Storage

Page 12: Solving io bottleneck

IMEXRESEARCH.COM

© 2010‐11  IMEX Research, Copying prohibited. All rights reserved.

1212

SCM: A New Storage Class Memory

• SCM (Storage Class Memory)Solid State Memory filling the gap between DRAMs & HDDsMarketplace segmenting SCMs into SATA and PCIe based SSDs

• Key Metrics Required of Storage Class MemoriesDevice - Capacity (GB), Cost ($/GB),

• Performance - Latency (Random/Block RW Access-ms); Bandwidth BW(R/W- GB/sec)

• Data Integrity - BER (Better than 1 in 10^17)• Reliability - Write Endurance (30K PE Cycles No. of writes before

death);• - Data Retention (5 Years); MTBF (2 millions of Hrs),• Environment - Power Consumption (Watts); • Volumetric Density (TB/cu.in.); Power On/Off Time

(sec),• Resistance - Shock/Vibration (g-force); Temp./Voltage Extremes

4-Corner (oC,V); Radiation (Rad)

Page 13: Solving io bottleneck

IMEXRESEARCH.COM

© 2010‐11  IMEX Research, Copying prohibited. All rights reserved.

SSDs - Price Erosion & IOPS/GB

Note: 2U storage rack, • 2.5” HDD max cap = 400GB / 24 HDDs, de-stroked to 20%, • 2.5” SSD max cap = 800GB / 36 SSDs

Source: IMEX Research SSD Industry Report ©2011

0

300

600

2009 2010 2011 2012 2013 2014

Uni

ts (M

illio

ns)

0

4

8

IOPS

/GB

Enterprise HDD Units (M)

SSD Units (M)

IOPS/GB SSDs

IOPS/GB HDDs

HDD

SSD

Page 14: Solving io bottleneck

IMEXRESEARCH.COM

© 2010‐11  IMEX Research, Copying prohibited. All rights reserved.

2500

2000

1500

1000

500

0

TPS

Buffer Pool (Memory) GB0 2 4 6 8 10 12 14 16 18 20

SSD PCIe

SSD SATA

HDD RAID 10

2.5 x4.0 x

2.5 x

4.0 x

Choosing SSD vs. Memory to Improve TPS

Page 15: Solving io bottleneck

IMEXRESEARCH.COM

© 2010‐11  IMEX Research, Copying prohibited. All rights reserved.

PerformanceData Protection

80% of IOPsScaleCost

Data Reduction

80% of TB

Dat

a A

cces

s

Age of Data1 Day 2 Mo.1 Month 3 Mo. 1 Year1 Week 2 Yrs6 Mo.

Stor

age

Gro

wth

SSDs

Data Storage Usage Patterns –Data Access vs. Age of Data

Source:: IMEX Research - Cloud Infrastructure Report ©2009-11

Page 16: Solving io bottleneck

IMEXRESEARCH.COM

© 2010‐11  IMEX Research, Copying prohibited. All rights reserved.

1616

I/O Access Frequency vs. Percent of Corporate Data

SSD• Logs• Journals• Temp Tables• Hot Tables

SSD• Logs• Journals• Temp Tables• Hot Tables

FCoE/ SAS

Arrays• Tables• Indices

• Hot Data

FCoE/ SAS

Arrays• Tables• Indices

• Hot Data

Cloud Storage

SATA• Back Up Data• Archived Data

• Offsite DataVault

Cloud Storage

SATA• Back Up Data• Archived Data

• Offsite DataVault

2% 10% 50% 100%1%% of Corporate Data

65%

75%

95%

% o

f I/O

Acc

esse

s

New Storage Hierarchy in NGDC & Clouds

Source:: IMEX Research - Cloud Infrastructure Report ©2009-11

Page 17: Solving io bottleneck

IMEXRESEARCH.COM

© 2010‐11  IMEX Research, Copying prohibited. All rights reserved.

IO Bottleneck Mitigation in Virtualized Servers

Primary Storage

VM Client 1

I/O Reg.

XCLDriver

Disk Controller. SSD

Driver

VM Client 2

I/O Reg.

XCLDriver

VM Client n

I/O Reg.

XCLDriver

XCLVLUN

vSphere ESX

SSD Drive w/ESX Driver

ESX Kernel

XCLMgr.

Offloading IOPS from Primary Storage

Both Applications & Storage Run Faster

Page 18: Solving io bottleneck

IMEXRESEARCH.COM

© 2010‐11  IMEX Research, Copying prohibited. All rights reserved.

1818

I/O Forensics for Auto Storage-Tiering

LBA Monitoring and Tiered Placement• Every workload has unique I/O access signature• Historical performance data for a LUN can identify

performance skews & hot data regions by LBAs

Source: IBM & IMEX Research SSD Industry Report 2011 ©IMEX 2010-11

Storage-Tiering at LBA/Sub-LUN Level

Storage-Tiered Virtualization

Physical Storage Logical Volume

SSDs Arrays

HDDs Arrays

Hot Data

Cold Data

Page 19: Solving io bottleneck

IMEXRESEARCH.COM

© 2010‐11  IMEX Research, Copying prohibited. All rights reserved.

1919

Dat

a: IM

EX R

esea

rch

& Pa

nasa

s

Smart Mobile Devices

Commercial Visualization

Bioinformatics& Diagnostics

Decision Support Bus. Intelligence

Entertainment-VoD / U-Tube

Instant On Boot UpsRugged, Low Power

1GB/s, __ms

Rendering (Texture & Polygons)Very Read Intensive, Small Block I/O

10 GB/s, __ms 4 GB/s, __ms

Data WarehousingRandom IO, High OLTPM

Most Accessed VideosVery Read Intensive

1GB/s, __ms

Apps Benefitting from Improved I/O

Page 20: Solving io bottleneck

IMEXRESEARCH.COM

© 2010‐11  IMEX Research, Copying prohibited. All rights reserved.

Key Takeaways

• Solving I/O Problems• I/O Bottlenecks occur at multiple places in the Compute Stack, the largest

being at Storage I/O• SSD comes out cheaper/IOP for IO Intensive Apps• To get of Reads – Improve Indexing, archive out old data• Minimize the impact of writes – Get rid of temp tables/filesorts on slow

disks.• Compress big varchar/text/blobs

• Data Forensics and Tiered Placement• Every workload has unique I/O access signature• Historical performance data for a LUN can identify performance skews &

hot data regions by LBAs• Use Smart Tiering to identify hot LBA regions and non-disruptively migrate

hot data from HDD to SSDs.• Typically 4-8% of data becomes a candidate and when migrated to SSDs

can provide response time reduction of ~65% at peak loads