vmworld 2013: extreme performance series: storage in a flash

46
Extreme Performance Series: Storage in a Flash Sankaran Sivathanu, VMware Mark Achtemichuk, VMware VSVC5603 #VSVC5603

Upload: vmworld

Post on 22-Jan-2015

932 views

Category:

Technology


4 download

DESCRIPTION

VMworld 2013 Sankaran Sivathanu, VMware Mark Achtemichuk, VMware Learn more about VMworld and register at http://www.vmworld.com/index.jspa?src=socmed-vmworld-slideshare

TRANSCRIPT

Page 1: VMworld 2013: Extreme Performance Series: Storage in a Flash

Extreme Performance Series:

Storage in a Flash

Sankaran Sivathanu, VMware

Mark Achtemichuk, VMware

VSVC5603

#VSVC5603

Page 2: VMworld 2013: Extreme Performance Series: Storage in a Flash

2

Flash Storage

Flash is everywhere

Used extensively in smartphones, tablets, laptop computers,

storage arrays, etc.

Adopting flash in enterprise servers?

• Presents an economical alternative to having a storage array

How does VMware embrace flash technology in vSphere 5.5?

• Native support for provisioning of flash resources

• Flash caching support in ESXi storage stack

• vSAN leverages flash storage for high performance

Focus:

Application performance on vSphere 5.5 when leveraging flash

Page 3: VMworld 2013: Extreme Performance Series: Storage in a Flash

3

Agenda

vSphere Flash Read Cache

(vFRC)

Virtual SAN (vSAN)

Configurations and Troubleshooting

Page 4: VMworld 2013: Extreme Performance Series: Storage in a Flash

4

vFRC Overview

Page 5: VMworld 2013: Extreme Performance Series: Storage in a Flash

5

vFRC – Overview

vSphere 5.5 introduces vSphere Flash Infrastructure layer

• Aggregates flash storage devices into a unified flash resource

• Supports locally connected flash devices (PCIe, SAS/SATA drives, etc.,)

Flash resource can be used for read caching of VM I/Os

• vSphere Flash Read Cache (vFRC)

Write Policy

• Write-through : write I/Os are written to persistent storage and vFRC

simultaneously

• Large writes are filtered – avoids cache pollution with log/streaming data

Caches configured on per-VMDK basis

• Can be custom configured based on workload

Page 6: VMworld 2013: Extreme Performance Series: Storage in a Flash

6

vFRC - Overview

Flash Read Cache

VMDKs

VM Layer

ESX Layer

Storage Layer

Page 7: VMworld 2013: Extreme Performance Series: Storage in a Flash

7

vFRC Tunables

Page 8: VMworld 2013: Extreme Performance Series: Storage in a Flash

8

Performance Tunables in vFRC

What workloads can benefit from vFRC ?

• Read-dominated I/O pattern

• High repeated access of data (E.g. 20% of working set accessed 80% of time)

• Sufficient flash capacity to hold data that is accessed repeatedly

What impacts vFRC performance ?

• Cache Size – should be big enough to hold active working set of workload

• Cache Block Size – should match the dominant I/O size of workload

• Flash Device Types – PCIe flash cards vs. SSD drives

Page 9: VMworld 2013: Extreme Performance Series: Storage in a Flash

9

a. Cache Size

Cache sizes are specified manually when enabling vFRC for

a VMDK

• Depends on working set of the application

• Should be sized to hold active working set

Inadequate cache sizes lead to increased cache miss rate

Over-sized cache leads to wastage of flash resources and sub-

optimal performance during vMotion

• By default cache is migrated during vMotion

• Over-sized cache increases vMotion time

How to determine the right working set size?

• vscsiStats workload tracing

Cache size can be modified at run-time if necessary

Page 10: VMworld 2013: Extreme Performance Series: Storage in a Flash

10

b. Cache Block Size

Basic unit of cache fill and cache eviction operation

Affects effective utilization of cache capacity

Bigger cache blocks lead to internal fragmentation, but consumes

less memory

Smaller cache blocks consumes more memory (upto 2% memory

space overhead)

Default cache block size is 8KB

0

0.5

1

1.5

2

2.5

4 8 16 32 64 128 256 512

MemoryConsumed/Cachesize(%)

CacheBlockSize(KB)

MemoryOverheadwrt.vFRCblocksize

0

300

600

900

1200

1500

4KB 8KB 16KB 32KB 64KB 128KB 256KB 512KB Baseline(novFRC)

LatencyinM

icroseconds

CacheBlockSize

PerformanceImpactofCacheBlockSize

Page 11: VMworld 2013: Extreme Performance Series: Storage in a Flash

11

b. Cache Block Size

Larger Cache Block Size (Example: 512KB cache block size for

workload I/O Size of 8KB) – Internal Fragmentation

vFRC

Cache Blocks

Valid Cached Data

Page 12: VMworld 2013: Extreme Performance Series: Storage in a Flash

12

c. Flash Device Type

30k – 40k Random Read IOPS

200 – 270 MB/s Read Bandwidth

Read Latency – 75 microseconds

Write Latency – 90 microseconds

Upto 750k Random Read IOPS

Upto 3 GB/s Read Bandwidth

Read Latency – 75 microseconds

Write Latency – 15 microseconds

High Performance Low Cost

VENDOR SPECIFICATIONS

Page 13: VMworld 2013: Extreme Performance Series: Storage in a Flash

13

vFRC Performance

Page 14: VMworld 2013: Extreme Performance Series: Storage in a Flash

14

vFRC Performance – Applications

What workloads can benefit from vFRC?

• Read-dominated I/O pattern

• High repeated access of data (E.g. 20% of working set accessed 80% of time)

• Sufficient flash capacity to hold data that is accessed repeatedly

Applications Considered:

• Data Warehousing (Swingbench DSS)

• Database Transactions (DVDstore)

• Real-World Enterprise Server Workloads (Publicly available I/O Traces)

Page 15: VMworld 2013: Extreme Performance Series: Storage in a Flash

15

1. Data Warehousing Application

Decision Support System [TPC-H]

Benchmark : Swingbench 2.4 using ‘Sales History’ Schema on

Oracle 11g R2 database

SWINGBENCH DSS BENCHMARK ON RHEL

6.4 VM

QUERIES >>

<< RESULTS

ORACLE 11G R2 ON WINDOWS 2008

SERVER VM

vFRC

EMC VNX 5700

1TB LUN, RAID5 OVER 5 FC 15k RPM HDDs

Page 16: VMworld 2013: Extreme Performance Series: Storage in a Flash

16

1. Data Warehousing Application

Workload: Read dominated, High re-access rate

vFRC Configuration: 8GB Cache Size and 8KB Cache block size

0

2000

4000

6000

8000

10000

12000

SRMC SCMC PSCR SMA PPSC TSQ SQC

#oftran

sacons

Transac onType

Transac onCount

Baseline VFRC

Page 17: VMworld 2013: Extreme Performance Series: Storage in a Flash

17

1. Data Warehousing Application

Workload: Read dominated, High re-access rate

vFRC Configuration: 8GB Cache Size and 8KB Cache block size

Up to 84% improvement in average throughput

Up to 2X reduction in latency

61.7

112.9

0

20

40

60

80

100

120

Baseline VFRC

TPM

Transac onsPerMinute

20.389

10.859

0

5

10

15

20

25

Baseline VFRC

ResponseTIm

e(s)

AverageResponseTime

Page 18: VMworld 2013: Extreme Performance Series: Storage in a Flash

18

2. Database Transaction Application

Benchmark : DVDStore

• Simulates online e-commerce site operations

• Database : MS SQL Server 2008

• Database Size : 15 GB

Workload Characteristics

• 60% reads

• Mostly random I/Os

• Predominant I/O size : 8KB

VM Configuration

• 8 vCPUs, 8GB Memory

• 25GB Database disk, 10GB Log disk

Storage Array

• VNX 5700, 1TB LUN – RAID5 over 5 FC 15k RPM disk drives

Page 19: VMworld 2013: Extreme Performance Series: Storage in a Flash

19

2. Database Transaction Application

8802 8937

12319

0

2000

4000

6000

8000

10000

12000

14000

Baseline vFRC-10GB vFRC-15GB

OrdersPerMinute

Up to 39% improvement in application throughput

Page 20: VMworld 2013: Extreme Performance Series: Storage in a Flash

20

3. Enterprise Server I/O Traces

1.23

0.321

0

0.2

0.4

0.6

0.8

1

1.2

1.4

Baseline vFRC

AverageLatency(ms)

a. Hardware Monitoring

Server Workload

• Trace from servers that logs

data from multiple hardware

monitoring programs across a

datacenter

• Collected at Microsoft

Research, Cambridge*

• Trace replayed using

IOAnalyzer

95% reads

vFRC size – 4GB

vFRC block size – 4KB

vFRC hit percentage – 85%

* Dushyanth Narayanan, Austin Donnelly, and Antony Rowstron. 2008. Write off-loading: Practical power management for enterprise storage.

Trans. Storage 4, 3, Article 10 (November 2008).

Page 21: VMworld 2013: Extreme Performance Series: Storage in a Flash

21

3. Enterprise Server I/O Traces

b. Proxy Server Workload

• Trace from a web proxy

server

• Collected at Microsoft

Research, Cambridge*

• Trace replayed using

IOAnalyzer

67% reads

vFRC Size : 16GB

vFRC block Size : 4KB

vFRC hit percentage :

83%

* Dushyanth Narayanan, Austin Donnelly, and Antony Rowstron. 2008. Write off-loading: Practical power management for enterprise storage.

Trans. Storage 4, 3, Article 10 (November 2008).

1.357

0.612

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

Baseline vFRC

AverageLatency(ms)

Page 22: VMworld 2013: Extreme Performance Series: Storage in a Flash

22

vMotion Performance with vFRC

vFRC is fully supported by vMotion and other vSphere features

vMotion behavior of vFRC-enabled VM

• VM caches are migrated by default

• Option to drop cache during vMotion

Migrating cache preserves application performance gains

• Consumes more network bandwidth

• Increased vMotion time

Dropping cache during vMotion leads to temporary dip in

application performance gains

• No extra overhead in vMotion

• Re-warms up the cache at destination

Page 23: VMworld 2013: Extreme Performance Series: Storage in a Flash

23

vFRC Performance Best Practices

Page 24: VMworld 2013: Extreme Performance Series: Storage in a Flash

24

vFRC Configuration Guidelines

Cache size may be configured based on working set of workload

• Start with about 20% of VMDK size, and monitor vFRC stats to re-configure it

Cache block size must match dominant I/O size of workload

• Workload I/O size is not equal to VM I/O Size!

vFRC performs better with PCIe flash devices

Decide on cache migration behavior during vMotion based on:

• Criticality of application performance

• Time taken for vMotion

• Network bandwidth availability

Page 25: VMworld 2013: Extreme Performance Series: Storage in a Flash

25

Making sense of vscsiStats and vFRC Stats

vscsiStats can be used to know more about the workload

• IO Length Histogram

• Read Write Ratio

• I/O trace to compute working set size

vFRC stats provide information about cache effectiveness

• numBlocks – total number of cache blocks for a VMDK

• numBlocksCurrentlyCached – number of cache blocks that actually contains

data

• Evict:avgNumBlocksPerOp – average number of evictions

• avgCacheLatency – average device latency of flash resource

• maxCacheLatency – maximum device latency of flash resource

• cacheHitPercentage – percentage of read cache hits

Page 26: VMworld 2013: Extreme Performance Series: Storage in a Flash

26

vFRC Sizing Decisions based on vFRC Stats

Using vFRC stats to make sizing decisions

• numBlocksCurrentlyCached < numBlocks : cache size may be reduced

• numBlocksCurrentlyCached = numBlocks and Evict:avgNumBlocksPerOp is

high : cache size may be inadequate

• maxCacheLatency is very high : may be because of spike in device latency,

which may mean device has worn out

• cacheHitPercentage is high and Evict:avgNumBlocksPerOp is low : means

cache is correctly configured

For more detailed information, please refer to our performance

whitepaper http://www.vmware.com/files/pdf/techpaper/vfrc-perf-

vsphere55.pdf

Page 27: VMworld 2013: Extreme Performance Series: Storage in a Flash

27

Agenda

vSphere Flash Read Cache

(vFRC)

Virtual SAN (vSAN)

Configurations and Troubleshooting

Page 28: VMworld 2013: Extreme Performance Series: Storage in a Flash

28

Virtual SAN - Architecture

Each ESXi host contributes:

• Flash storage to absorb IOPS

• Hard disk drives to provide capacity

Virtual SAN aggregates these

resources from multiple servers

in a vSphere cluster

• Provides a global datastore for VMs

in the cluster

HA/DRS ensures that the VM

restarts on a host crash

Virtual SAN objects can be split

into multiple components for

performance and data protection

• Governed by storage policies

ESX

VSAN cluster

ESX ESX

VM

virtual disk

VSAN object

replica-1 replica-2

Witness

Page 29: VMworld 2013: Extreme Performance Series: Storage in a Flash

29

Experiment Setup

Hardware

• 16 core 2.9 GHz Dell R720 machines

• 2 x Intel PCIe R910 SSD – 200GB (1 PCIe slot)

• 12 x 10K RPM Seagate SAS disks

• 10G vSAN dedicated network, 1G for VM network

VSAN Configuration

• 2 x Disk groups per machine, 6 x Disks per disk group

• hostFailuresToTolerate 1, stripeWidth 1

Workload Characteristics

• ViewPlanner 3.0 Standard benchmark with 2 sec think-time (heavy user)

• ViewPlanner Group A : CPU intensive & Group B: I/O intensive apps.

• 1900 x 1200 resolution, PCoIP

• Windows 7 desktop’s and Winxp Clients.

• VDI workload is known to be CPU intensive but sensitive to I/O latency.

Page 30: VMworld 2013: Extreme Performance Series: Storage in a Flash

30

Virtual SAN Delivers IOPS Required by VDI

• Virtual SAN can meet the IOPS required by VDI workload

Page 31: VMworld 2013: Extreme Performance Series: Storage in a Flash

31

Virtual SAN Scale..

275

460

667

0

100

200

300

400

500

600

700

800

3 node 5 node 7 node

Nu

mb

er

of

He

avy V

DI U

se

rs

Virtual SAN scale

VSAN Linear (VSAN)

Page 32: VMworld 2013: Extreme Performance Series: Storage in a Flash

32

Group A Score Comparison

0

0.2

0.4

0.6

0.8

1

1.2

Avg Application

Latency

Group A

VSAN

SAN

• Impact to Group A application latencies is marginal

• Virtual SAN uses very few cycles of Host CPU.

Page 33: VMworld 2013: Extreme Performance Series: Storage in a Flash

33

Group B Score Comparison

0

1

2

3

4

5

6

7

Avg Application

Latency

Group B

VSAN

All-Flash-SAN

• Group B application latencies are close to All-Flash-SAN

• Virtual SAN can meet the IOPS required by VDI workload

Page 34: VMworld 2013: Extreme Performance Series: Storage in a Flash

34

VSAN VDI Consolidation compared to physical SAN array

VSAN performs better than a typical mid-range FC storage array

• vSAN benefits from local flash storage that provide high performance

Impact of VSAN CPU consumption on application performance is low

Physical SAN array is not required to run VDI workload

0

100

200

300

400

500

600

700

800

3 5 7

NumberofVMs

NumberofNodes(Servers)

VDIConsolida onRa o

Mid-rangeFCArray

vSAN

All-FlashFCArray

Page 35: VMworld 2013: Extreme Performance Series: Storage in a Flash

35

Agenda

vSphere Flash Read Cache

(vFRC)

Virtual SAN (vSAN)

Configurations and Troubleshooting

Page 36: VMworld 2013: Extreme Performance Series: Storage in a Flash

36

Define the Performance Issue

Understand Application Function & Architecture

• At a minimum know what your application does and what it’s dependent on

Select Application KPIs

• Application performance must be measured using an application counter (tps,

response time, etc.) and not virtual resource consumption

Define Success Criteria

• With your app owner, define at what level the application KPI’s must be to

consider it performant

Comparisons must be Apples-to-Apples

• Any changes to infrastructure (physical or virtual) create

comparison challenges

Now the Gap is Identified, Begin Troubleshooting

• With an understanding of the requirements and current deficiency, you can

now begin to investigate and/or tune

Page 37: VMworld 2013: Extreme Performance Series: Storage in a Flash

37

Disk I/O Latencies

Application

Guest OS

ESX Storage

Stack

VMM

Driver

KAVG

DAVG

GAVG

QAVG

* KAVG = GAVG – DAVG

Fabric

vSCSI

HBA

Time spent in ESX

storage stack is minimal,

for all practical purposes

KAVG ~= QAVG

In a well configured

system QAVG should

be zero

Array SP

Page 38: VMworld 2013: Extreme Performance Series: Storage in a Flash

38

Key Indicators – Investigative Thresholds

Kernel Latency Average (KAVG)

• This counter tracks the latencies of IO passing thru the Kernel

• Investigation Threshold: 1ms

Device Latency Average (DAVG)

• This is the latency seen at the device driver level. It includes the roundtrip time

between the HBA and the storage

• Investigation Threshold: 15-20ms, lower is better, some spikes okay

Aborts (ABRT/s)

• The number of commands aborted per second

• Investigation Threshold: 1

Page 39: VMworld 2013: Extreme Performance Series: Storage in a Flash

39

Disk I/O Queues

GQLEN – Guest Queue

AQLEN – Adapter Queue

WQLEN – World Queue

DQLEN – Device / LUN

Queue

SQLEN – Array SP Queue

DQLEN

WQLEN

SQLEN

GQLEN

DQLEN can change

dynamically when SIOC

is enabled

Reported in esxtop AQLEN

Application

Guest OS

ESX Storage

Stack

VMM

Driver

Fabric

vSCSI

HBA

Array SP

Page 40: VMworld 2013: Extreme Performance Series: Storage in a Flash

40

Performance Technical Resources

Performance Technical Papers

• http://www.vmware.com/files/pdf/techpaper/vfrc-perf-vsphere55.pdf

• http://www.vmware.com/resources/techresources/cat/91,96

Performance Best Practices

• http://www.vmware.com/pdf/Perf_Best_Practices_vSphere4.0.pdf

• http://www.vmware.com/pdf/Perf_Best_Practices_vSphere4.1.pdf

• http://www.vmware.com/pdf/Perf_Best_Practices_vSphere5.0.pdf

• http://www.vmware.com/pdf/Perf_Best_Practices_vSphere5.1.pdf

Troubleshooting Performance Related Problems in vSphere Environments

• http://communities.vmware.com/docs/DOC-14905 (vSphere 4.1)

• http://communities.vmware.com/docs/DOC-19166 (vSphere 5)

• http://communities.vmware.com/docs/DOC-23094 (vSphere 5.x with vCOps)

Page 41: VMworld 2013: Extreme Performance Series: Storage in a Flash

41

Performance Community Resources

Performance Technology Pages

• http://www.vmware.com/technical-resources/performance/resources.html

Technical Marketing Blog

• http://blogs.vmware.com/vsphere/performance/

Performance Engineering Blog VROOM!

• http://blogs.vmware.com/performance

Performance Community Forum

• http://communities.vmware.com/community/vmtn/general/performance

Virtualizing Business Critical Applications

• http://www.vmware.com/solutions/business-critical-apps/

Page 42: VMworld 2013: Extreme Performance Series: Storage in a Flash

42

Extreme Performance Series Sessions

Extreme Performance Series:

vCenter of the Universe – Session #VSVC5234

Monster Virtual Machines – Session # VSVC4811

Network Speed Ahead – Session # VSVC5596

Storage in a Flash – Session # VSVC5603

Big Data:

Virtualized SAP HANA Performance, Scalability and Practices – Session

# VAPP5591

Hands on Labs:

HOL-SDC-1304 – Optimize vSphere Performance includes vFRC

Page 43: VMworld 2013: Extreme Performance Series: Storage in a Flash

43

Other VMware Activities Related to This Session

HOL:

HOL-SDC-1308

Virtual Storage Solutions

Group Discussions:

VSVC1001-GD

Performance with Mark Achtemichuk

VSVC5603

Page 44: VMworld 2013: Extreme Performance Series: Storage in a Flash

THANK YOU

Page 45: VMworld 2013: Extreme Performance Series: Storage in a Flash
Page 46: VMworld 2013: Extreme Performance Series: Storage in a Flash

Extreme Performance Series:

Storage in a Flash

Sankaran Sivathanu, VMware

Mark Achtemichuk, VMware

VSVC5603

#VSVC5603