kernel recipes 2014 - performance does matter

Infrastructure Benchmarking

Methodology & Tooling

Who am I ?

Erwan Velu ([email protected])

● Currently Performance Engineer @ eNovance● Previous Experiences

○ Release Manager @ SiT (In Flight Entertainment)○ Presale Engineer @ Seanodes (Distributed Storage - Ceph-like)○ Release Manager @ Mandriva (HPC product)

● Open Source Activity○ Part of the Mageia Team○ HDT (Hardware Detection Tool) Author and Syslinux Contributor○ Fio contributor

mailto:[email protected]

Why Benchmarking an Infrastructure ?

● Infrastructure deliver service to users but ...

● Servers could be inter-dependent like with CEPH or Swift○ One server expect data from another○ Servers will run at the speed of the slowest

● Users expect service to be constant○ Servers of the same kind shall perform the same○ Running/Moving to any hypervisor shall provide the same experience

● Don’t wait customers to complain before checking the performance

● The goal is to get a quick view of a server farm performance○ Does my server performs as expected ?○ Does all my servers performs the same ?

What to Benchmark ?

● Processor○ Every logical cpu○ All logical cpus

● Memory Bandwidth○ Small and Big blocks○ From 1K to 2G

● Storage○ Sequential and Random○ 1MB and 4K○ Read and Write

● Networking○ All-to-all communication

Hey ! What did you expect ?

● Processor○ Understand the raw power of a single core○ Understand the efficiency of all cores (~75% on a 6 core Intel E5 CPU)

● Memory Bandwidth○ Understand the bandwidth you can get from a VM

● Storage○ Estimate if you run at HDD or SSD speed○ Insure the block device is performing as expected

■ like 70K IOps on a SSD■ or 200MB/s of bandwidth

● Networking○ Validating the switch isn’t a limiting factor○ Insure each server is performing up to its limits

Tooling

● eDeploy○ eNovance project to generate reproducible operating systems builds○ http://github.com/enovance/edeploy

● Automatic Health Check (AHC)○ Part of eDeploy repository○ Build a small operating systems with tools & benchmark procedure

● Benchmark tools○ Sysbench (CPU & Memory)○ Fio (Storage)○ Netpipe / Netperf (Network)

Analyzing results

● Each host is uploading its results in a python structure format○ Saved on the server side in a directory

● Cardiff tool analyze a series of results to get a clear picture of the run○ Human brain cannot synthesis so much data easily

● We must compare apples to apples○ cardiff first group similar hosts regarding their hardware properties

● For each kind of test (cpu, memory, disk, network), and for each group of hosts○ Compute min, max, average, standard deviation○ If stddev is too important group isn’t stable enough○ If some hosts are too far from the average (vs stddev), host is curious

■ It have to be inspected by human to understand the variation○ Unless, hosts & groups are “OK”

Getting into the cloud !

● We do understand the bare-metal performance

● We can deploy an openstack & run the same tooling inside VMs○ Same tools, same metrics, same output

● Benchmarks have to be synchronized to get a simultaneous load of a component

● Results can be compared with bare metal performance○ to estimate the loss induced by the virtualization○ to insure how a number of VMs performs on a given infrastructure

● Can be also used to measure the resulting performance of ○ a patch○ a tuning of the infrastructure

AHC & OCP

● OCP is brand new hardware○ could even be at prototype level

● AHC is a quick path to understand how the hardware performs○ AHC is automated and reduce humans errors○ AHC makes a fixed comparaison point between people/project/companies

● AHC is open○ New benchmarks can be easily added○ It’s fully open source○ So let’s hack it !

THANK YOU

Erwan Velu <[email protected]>erwan_taf on freenode or oftc

mailto:[email protected]

kernel recipes 2014 - performance does matter

Software

performance engineer

server farm performance

peopleprojectcompanies

resulting performance

ahc ocp ocp

baremetal performance

bare metal performance

new hardware