mysql, containers, & ceph - redhat.com · mysql on ceph storage cloud ops efficiency....

48
MySQL, Containers, & Ceph Red Hat Summit, 2016

Upload: hoangnhi

Post on 01-Dec-2018

280 views

Category:

Documents


0 download

TRANSCRIPT

MySQL, Containers, & CephRed Hat Summit, 2016

WHOIS

Kyle BaderSenior Solutions ArchitectRed Hat

Yves TrudeauPrincipal ArchitectPercona

OPENSTACK CINDER DRIVER TRENDS

0

10

20

30

40

50

November  2014 May  2015 October  2015 April  2016%  using  Ceph  RBD  for  Cinder %  using  LVM  for  Cinder

0 10 20 30 40 50 60 70 80

LAMP

JAVA

MEAN

WISA

RAILS

Other

April 2016 October 2015

OPENSTACK APP FRAMEWORK TRENDS

• Shared, elastic storage pool

• Dynamic DB placement

• Flexible volume resizing

• Live instance migration

• Backup to object pool

• Read replicas via copy-on-write snapshots

MySQL ON CEPH STORAGE CLOUDOPS EFFICIENCY

MYSQL-ON-CEPH PRIVATE CLOUDFIDELITY TO A MYSQL-ON-AWS EXPERIENCE

• Hybrid cloud requires public/private cloud commonalities

• Developers want DevOps consistency

• Elastic block storage, Ceph RBD vs. AWS EBS

• Elastic object storage, Ceph RGW vs. AWS S3

• Users want deterministic performance

HEAD-TO-HEADPERFORMANCE

30  IOPS/GB:  AWS  EBS  P-­IOPS  TARGET

HEAD-TO-HEAD LABTEST ENVIRONMENTS

• EC2 r3.2xlarge and m4.4xlarge

• EBS Provisioned IOPS and GPSSD

• Percona Server

• Supermicro servers

• Red Hat Ceph Storage RBD

• Percona Server

OSD Storage Server Systems5x SuperStorage SSG-6028R-OSDXXX

Dual Intel Xeon E5-2650v3 (10x core)32GB SDRAM DDR32x 80GB boot drives 4x 800GB Intel DC P3700 (hot-swap U.2 NVMe)1x dual port 10GbE network adaptors AOC-STGN-i2S 8x Seagate 6TB 7200 RPM SAS (unused in this lab)Mellanox 40GbE network adaptor(unused in this lab)

MySQL Client Systems12x Super Server 2UTwin2 nodes

Dual Intel Xeon E5-2670v2(cpuset limited to 8 or 16 vCPUs)64GB SDRAM DDR3

Storage Server Software:Red Hat Ceph Storage 1.3.2Red Hat Enterprise Linux 7.2Percona Server

5x OSD Nodes 12x Client Nodes

Shared 10

G S

FP+

Netw

orking

Monitor Nodes

SUPERMICRO CEPHLAB ENVIRONMENT

7996 7956

950

1680 1687

267

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

P-IOPS m4.4xl

P-IOPSr3.2xl

GP-SSD r3.2xl

100% Read

100% Write

SYSBENCH BASELINE ON AWS EC2 + EBS

7996

67144

40031

16805677

1258

20053

4752

0

10000

20000

30000

40000

50000

60000

70000

80000

P-IOPS m4.4xl

Ceph cluster1x "m4.4xl"

(14% capacity)

Ceph cluster6x "m4.4xl"

(87% capacity)

100% Read

100% write

70/30 RW

SYSBENCH  REQUESTS  PER  MYSQL  INSTANCE

CONVERTING SYSBENCH REQUESTS TO IOPSREAD PATH

X%  FROM  INNODB  BUFFER  POOL

IOPS  =  (READ  REQUESTS  – X%)

SYSBENCH READ

CONVERTING SYSBENCH REQUESTS TO IOPSWRITE PATH

SYSBENCH WRITE

1X  READ

X%  FROM  INNODB  BUFFER  POOL

IOPS  =  (READ  REQ  – X%)

LOG,  DOUBLE  WRITE  BUFFER

IOPS  =  (WRITE  REQ  *  2.3)

1X  WRITE

30.0 29.8

3.6

25.6 25.7

4.1

0.0

5.0

10.0

15.0

20.0

25.0

30.0

35.0

P-IOPS m4.4xl

P-IOPS r3.2xl

GP-SSD r3.2xl

100% Read

100% Write

AWS IOPS/GB BASELINE: ~ AS ADVERTISED!

IOPS/GB PER MYSQL INSTANCE

30

252

150

26

78

19

0

50

100

150

200

250

300

P-­IOPS  m4.4xl

Ceph  cluster1x  "m4.4xl"

(14%  capacity)

Ceph  cluster6x  "m4.4xl"

(87%  capacity)

MySQL  IOPS/GB  ReadsMySQL  IOPS/GB  Writes

FOCUSING ON WRITE IOPS/GBAWS THROTTLE WATERMARK FOR DETERMINISTIC PERFORMANCE

26

78

19

0

10

20

30

40

50

60

70

80

90

P-­IOPS  m4.4xl

Ceph  cluster1x  "m4.4xl"

(14%  capacity)

Ceph  cluster6x  "m4.4xl"

(87%  capacity)

A NOTE ON WRITE AMPLIFICATIONMYSQL ON CEPH – WRITE PATH

INNODB  DOUBLEWRITE  BUFFER

CEPH  REPLICATION

OSD  JOURNALING

MYSQL INSERT

X2

X2

X2

EFFECT OF CEPH CLUSTER LOADING ON IOPS/GB

78

37

2519

134

72

37 36

0

20

40

60

80

100

120

140

160

Ceph  cluster  (14%  capacity)

Ceph  cluster  (36%  capacity)

Ceph  cluster(72%  capacity)

Ceph  cluster(87%  capacity)

IOPS/GB

100% Write

70/30 RW

18 1819

6

34 3436

8

0

5

10

15

20

25

30

35

40

Ceph cluster80 cores8 NVMe

(87% capacity)

Ceph cluster40 cores4 NVMe

(87% capacity)

Ceph cluster80 cores4 NVMe

(87% capacity)

Ceph cluster80 cores12 NVMe

(84% capacity)

IOPS/GB

100%  Write70/30  RW

CONSIDERING CORE-TO-FLASH RATIO

HEAD-TO-HEADPERFORMANCE

30  IOPS/GB:  AWS  EBS  P-­IOPS  TARGET

25  IOPS/GB:  CEPH  72%  CLUSTER  CAPACITY  (WRITES)78  IOPS/GB:  CEPH  14%  CLUSTER  CAPACITY  (WRITES)

HEAD-TO-HEADPRICE/PERFORMANCE

$2.50:  TARGET  AWS  EBS  P-­IOPS  STORAGE  PER  IOP

IOPS/GB ON VARIOUS CONFIGS

31  

18   18  

78  

10  

20  

30  

40  

50  

60  

70  

80  

90  IOPS/GB

(Sysbench  Write)

AWS  EBS  Provisioned-­IOPSCeph  on  Supermicro  FatTwin  72%  CapacityCeph  on  Supermicro  MicroCloud  87%  CapacityCeph  on  Supermicro  MicroCloud  14%  Capacity

$/STORAGE-IOP ON THE SAME CONFIGS

$2.40  

$0.80   $0.78  $1.06  

$-­

$0.50  

$1.00  

$1.50  

$2.00  

$2.50  

$3.00  

Storage  $/IOP

(Sysbench  Write)

AWS  EBS  Provisioned-­IOPSCeph  on  Supermicro  FatTwin  72%  CapacityCeph  on  Supermicro  MicroCloud  87%  CapacityCeph  on  Supermicro  MicroCloud  14%  Capacity

HEAD-TO-HEADPRICE/PERFORMANCE

$2.50:  TARGET  AWS  P-­IOPS  $/IOP  (EBS  ONLY)$0.78:  CEPH  ON  SUPERMICRO  MICRO  CLOUD  CLUSTER

TUNING CEPH BLOCK

TUNING CEPH BLOCK

• Format

• Order

• Fancy Striping

• TCP_NO_DELAY

RBD FORMAT

• Format 1

• Deprecated

• Supported by all versions of Ceph

• No reason to use it in greenfield environment

• Format 2

• New, default, format

• Support snapshot and clone

RBD ODER

• The chunk / striping boundary for block device

• Default is 4MB -> 22

• 4MB = 222

• Used default during our testing

RBD: Fancy Striping

• Only available to QEMU / librbd

• Finer striping for parallelization of small writes across order

• Helps with some HDD workloads

• Used default during our testing

TCP_NO_DELAY

• Disables Nagel congestion control algorithm

• Important for latency sensitive workloads

• Good for maximizing IOPS -> MySQL

• Default in QEMU

• Default in KRBD

• Added in mainline 4.2

• Backported to RHEL 7.2 3.10-236+

TUNING QEMUBLOCK VIRTUALIZATION

TUNING QEMU BLOCK

• Paravirtual Devices

• AIO Mode

• Caching

• x-data-plane

• num_queues

QEMU: PARAVIRTUAL DEVICES

• Virtio-blk

• Virtio-scsi

QEMU: AIO MODE

• Threads

• Software implementation of aio using thread pool

• Native

• User Kernel AIO

• Way to go in the future

QEMU: CACHING

Writeback None Writethrough Directsync

Uses  Host  Page  Cache Yes No Yes No

Guest  Disk  WCE Enabled Enabled Disabled Disabled

rbd_cache True False True False

rbd_max_dirty 25165824 0 0 0

BENCHMARKS

BENCHMARKS

• Sysbench OLTP, 32 tables of each 28M rows, ~200GB

• MySQL config: 50GB buffer pool, 8MB log file size, ACID

• Filesystem: XFS with noatime, nodiratime, nobarrier

• Data reloaded before each test

• 100% reads: --oltp-point-select=100

• 100% writes: --oltp-index-updates=100

• 70%/30% reads/writes: --oltp-index-updates=28 --oltp-point-select=70

--rand-type=uniform

• 20 minute run time per test, iterations averaged

• 64 threads, 8 cores

BASIC QEMU PERFORMANCE

0

5000

10000

15000

20000

25000

30000

35000

qemu tcg qemu-kvm-default io=threads cache=none io=native cache=none

IOPS

Reads Writes R/W 70/30

THREAD CACHING MODES

0

5000

10000

15000

20000

25000

30000

io=threads cache=none io=threads cache=writethrough io=threads cache=writeback

IOP

S

Reads Writes R/W 70/30

DEDICATED DISPATCH THREADS

0

5000

10000

15000

20000

25000

30000

35000

io=native cache=none io=native cache=directsync io=native cache=directsynciothread=1

io=native cache=directsynciothread=2

IOPS

Reads Writes R/W 70/30

DATA PLANE AND VIRTIO-SCSI QUEUES

0

5000

10000

15000

20000

25000

30000

35000

40000

x-data-plane virtio-scsi, num-queues=4 virtio-scsi, num-queues=2, vectors=3

virtio-scsi, num-queues=4, vectors=5

IOP

S

Reads Writes R/W 70/30

CONTAINERS AND METAL

0

10000

20000

30000

40000

50000

60000

Metal (taskset -c 10-17) lxc (cgroup cpu 10-17) io=threads cache=none io=native cache=none virtio-scsi, num-queues=2, vectors=3

IOP

S

Reads Writes R/W 70/30

8x Nodes in 3U chassisModel:SYS-5038MR-OSDXXXP

Per Node Configuration:CPU: Single Intel Xeon E5-2630 v4Memory: 32GB NVMe Storage: Single 800GB Intel P3700 Networking: 1x dual-port 10G SFP+

+ +

1x CPU + 1x NVMe + 1x SFP

SUPERMICRO MICRO CLOUDCEPH MYSQL PERFORMANCE SKU

THANK YOU!