qos, qos baby

32
QoS, QoS Baby OpenStack Barcelona 2016

Upload: anne-winiewicz-mccormick

Post on 16-Apr-2017

141 views

Category:

Engineering


2 download

TRANSCRIPT

Page 1: QoS, QoS Baby

QoS, QoS Baby

OpenStack Barcelona 2016

Page 2: QoS, QoS Baby

2

Speakers

Anne McCormickSoftware Engineer, Cisco

@amccormi4

Robert StarmerCTO & Principal

Kumulus Technologies@rstarmer

Alka SathnurOps QA, Cisco

@alkasat12

Page 3: QoS, QoS Baby

3

• Traditional QoS Concepts

• Current/Future OpenStack QoS

• Beyond the Network

• Compute/Storage Bottlenecks and Differentiation

• Use Case

• Q/A

Topics

Page 4: QoS, QoS Baby

4

Traditional QoS Concepts

Page 5: QoS, QoS Baby

5

- Wikipedia

Quality of service (QoS) is the overall performance of a telephony or computer network, particularly the performance seen by the users of the network.

Page 6: QoS, QoS Baby

6

• Network centric view of resource• Availability• Reliability

• Provides a model for understanding and manipulating network impact on service delivery• Jitter/latency/loss all are important aspects

of a communications channel

What is Quality of Service Characteristic Description

Bandwidth Rate at which bits are moved across a network, often in raw bits/s

Latency Delay between sending a bit (source) and receiving it (destination)

Jitter Change in latency over time

Reliability Level of loss of bits across a network

Page 7: QoS, QoS Baby

7

“QoS” in OpenStack

Resource Gross Fine Name

Compute Reservation Resource share flavor

Network Rate limits Queueing qos

Storage Rate limits Queueing iopsStorage network Rate limits Queueing qos

Page 8: QoS, QoS Baby

8

• Initial QoS was managed by Routers• Commited Information Rate• Routers matched bandwidth between different networks

• Handling contention led to QoS Policies or Classes• Priority• Multi-queue

• And multiple models of handling those queues• FIFO• WRED

QoS in the “physical” Network

Page 9: QoS, QoS Baby

9

• L2 networks tend to try to avoid ever storing packets• Less chance to manage different flows of traffic

• But L2 networks really aren’t L2 any more• So we can classify traiffic and if necessary queue it• Really helps when you have multiple types of traffic like storage and

voice or video on the same network

QoS in Layer 2 Networks

Page 10: QoS, QoS Baby

10

Current/Future OpenStack QoS

Page 11: QoS, QoS Baby

11

• RXTX Factor• Nova network based “Sharing” algorithm• Based on nova flavor metadata

• Neutron Mitaka• ML 2 Extension• SR-IOV, OVS, Linux Bridge “bandwidth” limiations (e.g. rx/tx factor)

• Neutron Newton• As with Mitaka• Adds DSCP marking < This is a big deal

QoS in the early days

Page 12: QoS, QoS Baby

12

• Seems like a straight forward approach:• Like non-oversubscribed processors• Sharing fixed IOPs limits on a storage array

• Rate limiting flows or specific services can have unintended consequences:• Dramatic impact to “good put” vs. “through put”• Particularly bursty applications can become unstable

Rate Limiting

Page 13: QoS, QoS Baby

13

• Let’s help the network out• Mark packets so that the network infrastructure has some better

information to go on• Execute marking via application/OS level (VM

• or• Execute marking via the switch input

• Not a panacea• May still have “good put” impact• At least provides a better interaction for determining who gets access to

the available bandwidth resources

DSCP Marking

Page 14: QoS, QoS Baby

14

Beyond the Network

So you got the traffic there faster… now what?

Compute and storage bottlenecks!

Page 15: QoS, QoS Baby

15

Compute Bottlenecks

… and how to alleviate them

Page 16: QoS, QoS Baby

16

Controller

Compute1

Compute2

ComputeN

nova-scheduler

Page 17: QoS, QoS Baby

17

Controller

nova-scheduler

Node1 Node1

VeryImportantTM

VM CPU HogVeryImportantTM

VM

Page 18: QoS, QoS Baby

18

Controller

nova-scheduler

Node1 Node1

VeryImportantTM

VMVeryImportantTM

VM

Page 19: QoS, QoS Baby

19

Ran a simple OpenStack multicast iperf test:

• Network highly optimized for multicast (SR-IOV port, multiple rx queues with maximum queue size, RSS, ARFS, QoS)

• iPerf receiver on tenant VM, receiving steady 800 Mbits/sec multicast stream

• When context switching, receiver experienced up to 0.2% packet loss, particularly when switching across NUMA nodes (as opposed to switching within same node)

Cost of CPU Sharing/Context Switching

Page 20: QoS, QoS Baby

20

• Host aggregates – define separate groups of compute hosts

• Flavors – define hardware needs such as number of cores, CPU capabilities/limits, affinity/anti-affinity, etc., via host filters

• CPU pinning/NUMA awareness – pin VMs to dedicated cores to prevent context switches across NUMA nodes

Compute Resource Differentiation/Prioritization

Page 21: QoS, QoS Baby

21

Storage Bottlenecks

… and how to alleviate them

Page 22: QoS, QoS Baby

22

Storage2

StorageN

Compute

Compute

Compute

Compute

Storage1

VeryImportantTM

VM

I/O Traffic

I/O Hog

I/O Traffic

Page 23: QoS, QoS Baby

23

Ran a simple OpenStack read/write I/O test:

• Two VMs running on same host, different volumes

• 3 Ceph nodes, active/active/active

• When reading simultaneously, both VMs experienced 80 MB/s drop in read rate

• When writing simultaneously, both experienced 100 MB/s drop in write rate

Cost of Storage Contention

Page 24: QoS, QoS Baby

24

• Host aggregates – define separate groups/clusters of storage servers

• Flavors – define I/O bandwidth limits for VMs (outbound traffic)

• Differentiate at storage backend• Cinder has QoS specs, volume types, priority (more IOPS to particular volumes)• Ceph has storage types and the ability to limit IOPS to certain spindles using crush

maps• AFAIK, Swift does not have the ability to differentiate/prioritize storage resources at

the backend

Storage Resource Differentiation/Prioritization

Page 25: QoS, QoS Baby

25

• Network QoS is only a partial solution

• In order to guarantee resources for mission-critical applications and data, a solution across all cloud resources (network, compute, storage) must be used

• It is complicated to get this right across all resources, but it can be done

Conclusion

Page 26: QoS, QoS Baby

26

Use Case

Page 27: QoS, QoS Baby

27

Bringing an existing Content Delivery Network that is comprised of bare metal cache nodes onto an Openstack platform

Real World Use Case

Page 28: QoS, QoS Baby

28

Content Delivery Network• A Delivery Service is a software structure in OMD that maps an origin source to Traffic Servers

by the Fully Qualified Domain Name (FQDN). FQDN is in the Request URI from the client media player. Cache groups can belong to a single or multiple Delivery services

• Cache Group is a logical grouping for HA. Each cache is typically located in different location to provide site-level redundancy. Each cache in cache group associated has single geo coordinates.Enables:

Multiple Content Sources

Per Content Source content cache/storage

Intelligent load balancing

Origin Server

Traffic server (Edge)

Origin Server Origin Server

Traffic Server (Edge)

Traffic Server

Traffic Server

Traffic Server

Traffic Server

Traffic Server

Traffic Server

Traffic Server

Traffic Server

Traffic Server

Traffic Server

Traffic Server

Mid-tier cache

Edge cache Groups

Control Server

CDN Monitor

CDN Analytics

Orchestration

Page 29: QoS, QoS Baby

29

Content Delivery Network• A Delivery Service is a software structure in OMD that maps an origin source to Traffic Servers by

the Fully Qualified Domain Name (FQDN). FQDN is in the Request URI from the client media player. Cache groups can belong to a single or multiple Delivery services

• Cache Group is a logical grouping for HA. Each cache is typically located in different location to provide site-level redundancy. Each cache in cache group associated has single geo coordinates.

Origin Server

Traffic server (Edge)

Origin Server Origin Server

Traffic server (Edge)

Traffic Server

Traffic Server

Traffic Server

Traffic Server

Traffic Server

Traffic Server

Traffic Server

Traffic Server

Traffic Server

Traffic Server

Traffic Server

Control Server

CDN Monitor

CDN Analytics

Director

Orchestration

Mid-tier cache

Edge Cache Groups and Storage Clusters

Traffic Server

Traffic Server

Page 30: QoS, QoS Baby

30

Dynamically expanding a Content Delivery Network is possible, provided the Orchestrator ensures that network, compute and storage give top priority to the application traffic. Thereby meeting the three goals – Performance, Availability and Capacity.

Use Case Summary

Page 31: QoS, QoS Baby

¿Preguntas?

Page 32: QoS, QoS Baby

32

¡Gracias!