best practices guide for ibm power systems solution for mariadb
TRANSCRIPT
© Copyright IBM Corporation, 2014
Best practices guide for IBM Power Systems solution for MariaDB
Guidance for the installation and tuning of MariaDB running on
Linux on Power featuring the new
IBM POWER8 technology
Axel Schwenke
MariaDB Corporation
Sergey Vojtovich
MariaDB Foundation
Hari Reddy
IBM Systems and Technology Group ISV Enablement
December 2014
Best practices guide for IBM Power Systems solution for MariaDB
Table of contents
Abstract ..................................................................................................................................... 1
Introduction .............................................................................................................................. 1
Prerequisites ............................................................................................................................. 1
Executive summary .................................................................................................................. 1
Storage engines ....................................................................................................................................... 2
Server architecture ................................................................................................................................... 2
Installation of MariaDB ............................................................................................................. 4
Installing from binary packages ............................................................................................................... 4
Installing from binary .tar files .................................................................................................................. 4
Building and installing from source .......................................................................................................... 5
Configuring MariaDB ................................................................................................................ 7
Data directory ........................................................................................................................................... 7
XtraDB configuration ................................................................................................................................ 8
Sample MariaDB configuration file (my.cnf) .......................................................................................... 10
Tuning of MariaDB .................................................................................................................. 11
MariaDB tuning ...................................................................................................................................... 11
Miscellaneous hints ................................................................................................................................ 11
Power Systems built with the POWER8 technology ............................................................ 12
Linux on Power tuning guidelines......................................................................................... 13
Simultaneous Multithreading (SMT) ...................................................................................................... 13
Hardware prefetch ................................................................................................................................. 14
PowerKVM .............................................................................................................................. 15
PowerKVM tuning ................................................................................................................... 17
Guest CPU model and topology ............................................................................................................ 17
Mapping of guest CPUs ......................................................................................................................... 17
CPU placement .............................................................................................................. 18
CPU tuning ..................................................................................................................... 20
I/O tuning (virtio) .................................................................................................................................... 21
virtio ................................................................................................................................ 21
cache .............................................................................................................................. 22
io...................................................................................................................................... 22
Memory tuning ....................................................................................................................................... 23
Network tuning ....................................................................................................................................... 23
Sample guest XML configuration ........................................................................................................... 23
Summary ................................................................................................................................. 27
Acknowledgements ................................................................................................................ 28
Resources ............................................................................................................................... 29
About the authors ................................................................................................................... 30
Trademarks and special notices ........................................................................................... 31
Best practices guide for IBM Power Systems solution for MariaDB
1
Abstract
This white paper describes the installation and tuning of MariaDB version 10.0.14 on IBM Power Systems servers featuring the IBM POWER8 processor technology. The target audience is users and system integrators interested in using Linux on Power and MariaDB. Some familiarity with MariaDB, Linux on Power, and IBM PowerKVM might be helpful.
Introduction
This paper provides best practices guidelines for deploying MariaDB on Linux on Power featuring IBM®
POWER8™ and IBM PowerKVM technology. The instructions and guidelines described in this paper are
applicable to any version of MariaDB, Linux on Power distributions, PowerKVM and any POWER8
Systems capable of running Linux on Power distributions and PowerKVM. The subsequent sections
provide an overview of the MariaDB architecture, the instructions to build, install, and run MariaDB on
Linux on Power and PowerKVM, and guidelines to tune MariaDB and PowerKVM to run MariaDB
efficiently on Linux on Power and PowerKVM.
Prerequisites
In addition to the prior knowledge of MariaDB, basic familiarity with commands and tools used in Linux,
PowrKVM, and Linux on Power might be very helpful.
Executive summary
MariaDB scales extremely well on POWER8, mostly because the thread synchronization costs are very
low. That means, MariaDB can use many processor cores and must be configured to make the most out of
that. The following techniques help improve performance of running MariaDB on a Linux on Power system
using the POWER8 technology.
Partition global data structures such as the InnoDB buffer pool and the adaptive hash index. Bump
up max_connections, table_open_cache and thread_cache.
Configure a higher number of InnoDB background threads: read I/O threads, write I/O threads,
and purge threads. If you use replication, configure the slave to use multiple threads.
Tune the I/O subsystem; use appropriate I/O scheduler, file system, and mount options.
Use higher levels of simultaneous mutlithreading (SMT) of POWER8 to use the processor
resources efficiently.
Use PowerKVM and virtio, a virtualization standard for network and disk device drivers to get high
performance network and disk operations
Configure PowerKVm guest so that MariaDB can make use of the nonuniform memory access
(NUMA) architecture of POWER8.
Best practices guide for IBM Power Systems solution for MariaDB
2
Introduction to MariaDB
MariaDB is a fork of the MySQL database. The MariaDB Corporation (https://mariadb.com/) is the
company behind the MariaDB development. MariaDB aims to be a fully open source drop-in replacement
for MySQL. MariaDB extends MySQL in several ways. For a feature comparison, refer
https://mariadb.com/kb/en/mariadb/mariadb-vs-mysql-features/.
Support for the POWER8 platform was added in MariaDB 10.0.14. The MariaDB 10.0.x series
corresponds to MySQL 5.6.x (feature wise). Refer to Figure 1 for a representation of the MariaDB
architecture.
Storage engines
The MariaDB architecture consists of the components presented in Figure 1. MySQL and hence MariaDB
use a design where logical and physical operations on data are separated. The part of the server that
handles physical data storage is called the storage engine. MariaDB comes with a bunch of standard
storage engines. The storage engine application programming interface (API) allows adding storage
engines in the form of dynamic libraries. This feature is used by third parties to add their own storage
engines.
The most common storage engine in MySQL is InnoDB. InnoDB is a fully Atomicity, Consistency, Isolation,
Durability (ACID) compliant, transactional engine. Some years ago Percona, another MySQL offspring,
started XtraDB. This is a patched version of the InnoDB engine and is at the heart of the Percona Server
(refer to http://www.percona.com/software/percona-server).
MariaDB uses XtraDB by default. InnoDB and XtraDB are mutually exclusive, partly because XtraDB
identifies itself as InnoDB. MariaDB allows you to choose between XtraDB and InnoDB.
Server architecture
The MariaDB server is a single process by the name mysqld running under a nonprivileged user ID
(default: mysql). The mysqld process spawns a few helper threads when it is started. One additional
worker thread is started for each connection that is opened by a client or an application. When the
connection is closed, the thread goes to the thread cache and is eventually reused.
For applications that require a very big number (thousands) of open connections, this model does not
scale well. For those cases, MariaDB offers a thread pool (which must be enabled in the configuration file.
The thread pool limits the number of worker threads and multiplexes connections to threads.
Applications communicate with the MariaDB server over a socket, which can be either a UNIX® domain
socket for applications running on the same server or a TCP socket on the default port 3306 when the
client application resides on a separate server in a clustered environment. The protocol is 100%
compatible between MariaDB and MySQL. So, any application build for MySQL can work with MariaDB.
The general architecture of MariaDB is shown in Figure 1.
Best practices guide for IBM Power Systems solution for MariaDB
3
Figure 1: MariaDB architecture
Best practices guide for IBM Power Systems solution for MariaDB
4
Installation of MariaDB
This section describes the following three ways to install MariaDB on Linux on Power:
Installation from binary packages
Installation from the .tar files
Building from source and installing
Installing from binary packages
The MariaDB binary packages for Linux on Power are available on the MariaDB.com website as part of
the MariaDB Enterprise product suite. You can download the binary packages at:
https://mariadb.com/user/register?destination=my_portal/download. At this time, the latest version
available is: 10.0.14. These packages are available for all registered users (registration is free). This
section provides the essential steps to download and install MariaDB from the binary packages.
Perform the following steps to install MariaDB from binary packages.
1. Visit https://mariadb.com/user/register?destination=my_portal/download and create a new account
(registration is required and it is free of cost).
2. On the main download page in Maria DB Enterprise (left column), select the MariaDB version
(10.0.14 at the time of publishing this paper.)
3. Select Ubuntu14.04 as the platform.
4. Click the Ubuntu 14.04 LTS (Trusty) Packages link.
5. Run the following commands to install the MariaDB server.
sudo apt-get install python-software-properties software-properties-common
sudo apt-key adv --recv-keys --keyserver keyserver.ubuntu.com
0xd324876ebe6a595f
sudo add-apt-repository 'deb http://USER:[email protected]/mariadb-
enterprise/10.0/repo/ubuntutrusty main'
sudo apt-get update
sudo apt-get install mariadb-server
Note: USER and PASSWORD must be replaced with a valid mariadb.com account ID and password.
Installing from binary .tar files
The MariaDB binary .tar files for Linux of Power are available at the MariaDB.com website as part of
MariaDB Enterprise product suite. You can download the binary .tar files from
https://mariadb.com/user/register?destination=my_portal/download. At this time, the latest version
available is: 10.0.14. The .tar files are available for all registered users (registration is free). Essential
Best practices guide for IBM Power Systems solution for MariaDB
5
steps to download and install MariaDB from the binary .tar files on Linux on Power are listed in this
section.
1. Visit https://mariadb.com/user/register?destination=my_portal/download and create a new account
(registration is required and it is free of cost).
2. On the main download page in Maria DB Enterprise (left column), select the MariaDB version
(10.0.14 at the time of publishing this paper.)
3. Select Ubuntu14.04 as the platform.
4. To download the .tar files, click mariadb-10.0.14-linux-ppc64le.tar.gz and save the file at /tmp.
5. Run the following commands to unpack the binary files
cd /usr/local
tar xfz /tmp/mariadb-10.0.14-linux-ppc64le.tar.gz
cd mariadb-10.0.14-linux-ppc64le
6. Follow the instructions listed in the file called INSTALL-BINARY which is located in this directory to
install the MariaDB server.
Building and installing from source
The MariaDB source is available at the MariaDB.com website as part of MariaDB Enterprise product suite.
You can download the source tarball from
https://mariadb.com/user/register?destination=my_portal/download. At this time, the latest version
available is: 10.0.14. The tarballs are available for all registered users (registration is free).
You need several prerequisites installed to build MariaDB from source. Besides cmake and make, you
need a C/C++ toolchain. IBM Advance Toolchain 8.01 is recommended. In addition, you need libncurses5-
dev, libssl-dev, libaio-dev, libjemalloc-dev, bison. cmake will complain if anything is missing. Please see
the MySQL manual for a detailed description of the necessary post install steps:
https://dev.mysql.com/doc/refman/5.6/en/source-installation.html. This section provides the essential steps
to build and install MariaDB from the source .tar files.
1 You can download and install IBM Advance Toolchain from
ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/W51a7ffcf4dfd_4b40_9d82_446ebc23c550/page/IBM%20Advance%20Toolchain%20for%20PowerLinux%20Documentation
Best practices guide for IBM Power Systems solution for MariaDB
6
Perform the following steps to build and install MariaDB from the source .tar files.
1. Visit https://mariadb.com/user/register?destination=my_portal/download and create a new account
(registration is required and it is free of cost).
2. On the main download page in Maria DB Enterprise (left column), select the MariaDB version
(10.0.14 at the time of publishing this paper.)
3. Select Source Code as the platform.
4. Click mariadb-10.0.14-.tar.gz to download the .tar file.
5. Run the following commands to install the prerequisites.
apt-get install cmake gcc g++ make libaio-dev libevent-dev \\
libjemalloc-dev libncurses5-dev bison libssl-dev
6. Run the following commands to build and install the MariaDB server.
tar xfz mariadb-10.0.14-.tar.gz
cd mariadb-10.0.14
mkdir build
cd build
cmake -DCMAKE_INSTALL_PREFIX=/usr/local/mariadb-10.0.14 \\
-DCMAKE_BUILD_TYPE=Release
make
make install
Best practices guide for IBM Power Systems solution for MariaDB
7
Configuring MariaDB
MariaDB is configured through a set of server variables. Almost all of those have safe default values, but
some must be modified. This is done in a file named /etc/mysql/my.cnf. For a description of the my.cnf file,
refer to the following URL: https://dev.mysql.com/doc/refman/5.6/en/option-files.html.
Many of the server variables are dynamic variables, which means that they can be modified by the SQL
SET statement (SET GLOBAL variable=values) while the server is running (and the change becomes
effective immediately). There are other variables which must be set in the .cnf file and require a restart of
the MariaDB server. Refer to the URL: https://dev.mysql.com/doc/refman/5.6/en/mysqld-option-
tables.html for a description of all these variables.
Data directory
The storage engines used in MariaDB store data in files below a common subdirectory, called the data
directory. This is one of the most important server variables to set. The directory must exist and be owned
by the user ID that started the mysqld process. The initial setup of the data directory is done with the
mysql_install_db tool. This tool creates the initial system databases and accounts. The installation from
the binary packages performs this setup automatically.
MariaDB works with most of the file systems in use on modern Linux distributions. Best performance is
reached with ext4 or xfs. Btrfs is known to have performance problems with database workload. It is
recommended to use the realtime mount option and for ext4 also nobarrier option . For the block device
holding the file system, it is recommended to use the deadline I/O scheduler (refer to the following listing).
The I/O scheduler used for a block device can be checked and set in sysfs:
cat/sys/devices/vio/2000/host0/target0:0:0/0:0:0:0/block/sda/queue/scheduler
noop deadline [cfq]
echo "deadline" >
/sys/devices/vio/2000/host0/target0:0:0/0:0:0:0/block/sda/queue/scheduler
cat /sys/devices/vio/2000/host0/target0:0:0/0:0:0:0/block/sda/queue/scheduler
noop [deadline] cfq
Listing 1: Instructions to set deadline scheduler
For a decent I/O performance, you must use a high performance I/O subsystem. Good results have been
seen from local Redundant Array of Independent Disks (RAID) setups using battery backed RAM cache.
But also, storage area network (SAN) storage works well. Getting a decent performance from network-
attached storage (NAS) devices can be tricky and typically requires much tuning on the NAS side.
Best practices guide for IBM Power Systems solution for MariaDB
8
XtraDB configuration
Proper setting of several configuration variables is important for an efficient operation of an XtraDB
storage engine. Some of the variables are listed in Table 1.
XtraDB parameter Description Recommended value
innodb_buffer_pool_size Memory buffer shared between all mysqld threads, caching InnoDB data and index pages.
As large as possible.
Set it 50% to 85% of
available memory.
innodb_buffer_pool_instances The InnoDB buffer pool can be
partitioned to reduce mutex
contention.
Try the number of
processor cores; values
above 32 seem to have
little effect though.
innodb_adaptive_hash_index_partitions The InnoDB adaptive hash index
can be partitioned to reduce mutex
contention.
Try the number of
processor cores; values
above 32 seem to have
little effect though.
innodb_log_file_size The size of one XtraDB redo log.
Large size results in long recovery
and
small size results in slow
performance, especially for write-
intensive workloads
The total size of all relay
logs should be 5% to
20% of the buffer pool
size.
innodb_log_buffer_size This is a buffer for write operations
to the InnoDB REDO log. The log
is flushed at every COMMIT or at
least once a second. Ideally, the
buffer should be big enough to
hold all subsequent REDO log
write operations.
Values up to 16MB are
reasonable.
innodb_io_capacity The number of I/O operations that
InnoDB can do per second. Pure
background I/O will not exceed
that limit. When InnoDB is forced
to use synchronous I/O, it will
however exceed that limit.
This should match the
characteristics of your
I/O subsystem.
max_connections
The maximum number of
concurrent connections.
Depends on the
application.
table_open_cache A shared cache for table handles.
If this cache is too small, it will
cause severe performance loss.
Set to
(max_connections) *
(max number of tables
used in a single
Best practices guide for IBM Power Systems solution for MariaDB
9
operation (that is, a JOIN
or a subquery)
For write workloads
innodb_flush_neighbors InnoDB flushes dirty pages
ordered by age. If a page to be
flushed has dirty neighbors, those
can be flushed at the same time.
Enable neighbor flushing
on traditional disks,
disable on flash storage.
innodb_read_io_threads
innodb_write_io_threads
Number of InnoDB background I/O
threads.
Depends on the storage
subsystem. If it can do
multiple I/O requests in
parallel, use that
number.
Table 1: XtraDB configuration parameters
Best practices guide for IBM Power Systems solution for MariaDB
10
Sample MariaDB configuration file (my.cnf)
A sample MariaDB/InnoDB configuration file is shown in the following listing. This file can be used as a
starting template and modified as needed.
[mysqld]
basedir = /usr/local/mariadb-10.0.14
datadir = /data/mariadb
performance-schema = false
max_connections = 1000
back_log = 150
table_open_cache = 2000
key_buffer_size = 16M
query_cache_type = 0
join_buffer_size = 32K
sort_buffer_size = 32K
innodb_file_per_table = true
innodb_open_files = 100
innodb_data_file_path = ibdata1:50M:autoextend
innodb_flush_method = O_DIRECT_NO_FSYNC
innodb_log_buffer_size = 16M
innodb_log_file_size = 4G
innodb_log_files_in_group = 2
innodb_buffer_pool_size = 32G
innodb_buffer_pool_instances = 32
innodb_adaptive_hash_index_partitions = 32
innodb_thread_concurrency = 0
#tuning for SAN storage
innodb_adaptive_flushing = 1
innodb_flush_neighbors = 1
innodb_io_capacity = 4000
innodb_io_capacity_max = 6000
innodb_lru_scan_depth = 4096
innodb_purge_threads = 2
innodb_read_io_threads = 8
innodb_write_io_threads = 16
Listing 2: Sample MariaDB/XtraDB configuration file
Best practices guide for IBM Power Systems solution for MariaDB
11
Tuning of MariaDB
Improving the performance of MariaDB running on Linux on Power requires the tuning of MariaDB
application and the Linux on Power system. This section provides the tuning information specific to
MariaDB. In the section “Linux on Power tuning guidelines” the details about the tuning of the underlying
Linux on Power and PowerKVM running MariaDB are provided.
MariaDB tuning
Because MariaDB inherits most of its features from MySQL, most MySQL tuning guidelines apply. Check
the MySQL manual at: https://dev.mysql.com/doc/refman/5.6/en/optimization.html.
The XtraDB engine has its own tuning guide at: http://www.percona.com/doc/percona-server/5.6/.
Finally the MariaDB knowledge base contains a MariaDB specific tuning guide:
https://mariadb.com/kb/en/mariadb/documentation/optimization-and-tuning/
Miscellaneous hints
When running MariaDB on NUMA hardware, Use of the --numa-interleave option for mysqld_safe is
recommended. This can also be set in my.cnf in a [mysqld_safe] section.
Best practices guide for IBM Power Systems solution for MariaDB
12
Power Systems built with the POWER8 technology
This section presents an overview of the POWER8 technology. POWER8 is a multicore, multichip (node),
and a multisocket processor technology. The number of chips and sockets available can vary with the
model purchased. A representative layout of the POWER8 processor is shown in the following figure with
double the memory bandwidth when compared to the IBM POWER7+™ processor.
Figure 2: POWER8 processor
Best practices guide for IBM Power Systems solution for MariaDB
13
Linux on Power tuning guidelines
This section describes Linux on Power tuning guidelines for:
Simultaneous multithreading
Hardware prefetch
Simultaneous Multithreading (SMT)
When running Linux under PowerKVM, you have the option to choose among different SMT levels for the
guest. Setting higher levels of SMT in the PowerKVM guest improves system utilization and throughput..
The best choice for setting the SMT level depends on how the database is used, specifically of the
concurrency of the database workload.
For mostly read-only workloads if the application runs many SQL queries concurrently, then the system
throughput can be maximized by configuring a system SMT level of 8. If the database load has lower
concurrency, especially if there are not more concurrent operations than virtual processor cores when
configured in SMT4, then SMT4 will be the better choice.
For database workloads with significant amount of writes, SMT4 will be the better choice in most cases.
The following figure shows typical system behavior of a two-socket / 20 core POWER8 system. With
SMT4, this system has 80 virtual processor cores, and with SMT8, it has 160 virtual processor cores.
1 10 20 40 80 160 320
OLTP readonly system throughput
SMT4
SMT8
client threads
tra
ns
actio
ns
pe
r s
eco
nd
Figure 3: MariaDB read-only performance for various SMT settings and thread counts in the MariaDB server
Best practices guide for IBM Power Systems solution for MariaDB
14
Recommendation: The recommendation is to enable the PowerKVM guest to run up to eight SMT
threads per virtual core giving the user the flexibility to adjust (as needed) the SMT setting on the guest
For mostly read-only workloads which exhibit higher levels of concurrency set the SMT level 0 8 in the
PowerKVM guests. For read-only workloads with lower concurrency and read-write workloads set the SMT
level to 4. Also, it is recommended to conduct a benchmark analysis with your workload and resource
usage in order to determine the SMT setting that can benefit your workload.
You can use the following Linux on Power command to set SMT=8 in the KVM guest. ppc64_cpu –smt=8
The correct choice of the SMT level also depends on how the application uses the database. If the
application can keep many database connections busy, then higher SMT levels give a benefit because
MariaDB can then schedule the resulting worker threads on more virtual processors. But, if the application
uses only few connections to the database, then the additional virtual processors would not be used
anyway.
Hardware prefetch
The Data Stream Control Register (DSCR) of IBM POWER® processors is used to control the degree of
aggressiveness of memory prefetching for load and store. Performance of MariaDB improves slightly when
the hardware prefetch is turned off. The command to turn hardware prefetch on or off is shown in this
section.
Recommendation: Turn the hardware prefetch off before starting the MariaDB server. Preferably, the
hardware prefetch should be turned off on the KVM guest. Always conduct a benchmark analysis with your
workload and resource usage in order to determine the DSCR setting that can benefit your workload.
Use the following command to turn the hardware prefetch ON. ppc64_cpu –dscr=0
Use the following command to turn the hardware prefetch OFF. ppc64_cpu –dscr=1
Best practices guide for IBM Power Systems solution for MariaDB
15
PowerKVM
IBM PowerKVM is a hypervisor that allows virtualization of a Linux on Power system, using the open
source virtualization standard, Kernel Virtual Machine (KVM). PowerKVM environment consists of physical
hardware (node), PowerKVM OS and hypervisor (host), and KVM guests (domains). Refer to the following
figure. PowerKVM supports a variety of Linux operating systems for the guest operating system.
Figure 4: PowerKVM environment
virsh is a PowerKVM system application that uses the command-line interface (CLI) to manage the
PowerKVM guests. Some examples of the virsh command are given in this section. A PowerKVM guest
can be configured by entering its definition in an XML format into a file and giving the XML file as a
parameter to the virsh command. The virsh command can be used to manage the KVM guests. For
more details about the virsh command, refer to “man virsh” on a PowerKVM host. The virsh command
does not work on a PowerKVM guest.
Best practices guide for IBM Power Systems solution for MariaDB
16
The following list provides examples of the vrish command.
To define a guest VM: virsh define p215vm135.xml
To start a guest VM: virsh start p215vm135
To shut down a guest VM: virsh shutdown p215vm135
To undefine a guest VM: virsh undefine p215vm135
To display guest information: virsh start p215vm135
To list the virtual machines: virsh list
To dump the KVM guest configuration: virsh dumpxlml p215vm135 > p215vm135-current.xml
Best practices guide for IBM Power Systems solution for MariaDB
17
PowerKVM tuning
The PowerKVM guest definition comprises of several elements. In this section, the following elements are
described:
Guest CPU model and topology
Mapping of guest CPUs
I/O tuning (virtio)
Network tuning (virtio)
For more details about the KVM tunable elements described in this section, refer to:
http://libvirt.org/formatdomain.html#elements and http://wiki.libvirt.org/page/Virtio. For an excellent tutorial
on these topics, refer to:
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/W51a7ffcf4dfd_4b40_9d82_4
46ebc23c550/page/Basic%20XML%20tuning%20tips%20for%20PowerKVM%20guests
Guest CPU model and topology
The cpu element can be used to define the PowerKVM guest CPU topology, including the max SMT mode
as indicated by the threads parameter. The three parameters, sockets, cores, and threads define the total
number of vCPUs allocated to a PowerKVM guest using the formula defined.
Recommendation: Always set the number of threads to 8 in the guest configuration setup.
Use the following topology specification to create a guest with 80 vCPUs on one socket using 10 cores
and SMT=8
<cpu>
<topology sockets='1' cores='10' threads='8'/>
</cpu>
Definition of the number of CPUs allocated to a PowerKVM guest:
Number of vCPUs = sockets * cores/socket * thread per core = 1 * 10 * 8 = 80
Mapping of guest CPUs
IBM POWER8 processor-based servers use a NUMA memory architecture. POWER8 implements NUMA
with a multicore, multichip (node), and a multisocket processor technology. numactl (on host and guest)
and virsh (on host only) commands provide information about the current NUMA configuration on the host
as well as the guest. Use the following commands to get the host and guest NUMA configurations.
numactl –H
virsh nodeinfo
In PowerKVM, the guest can be configured to be “NUMA-aware” which allows the user to closely map the
application to the hardware resources. This can improve the performance of the MariaDB application
running on a POWER8 system. The advantage of pinning the vCPUs to the host CPUs is that you can
Best practices guide for IBM Power Systems solution for MariaDB
18
control where the guest runs, reduce the number of scheduler switches, and improve the possibility of
getting data out of the caches. Pinning of guest vCPUs to the host CPUs is useful when a subset of the
host CPUs need to be allocated to the KVM guest. This section describes two different methods of
mapping the guest vCPUs to the host CPUs.
CPU placement
In this specification, the vCPUs in a guest can run in any of the host CPUs allocated to the guest.
vcpu
The content of this element, vcpu, defines the maximum number of virtual CPUs allocated for the guest
OS, which must be between 1 and the maximum number supported by the hypervisor.
cpuset
The optional attribute cpuset is a comma-separated list of physical CPU numbers that the domain
process and virtual CPUs can be pinned to by default. The pinning policy of the domain process and
virtual CPUs can also be specified separately by the cputune element and will override the specification
in the vcpu element.
placement
The optional attribute placement can be used to indicate the CPU placement mode for the domain
process. The recommended value for this attribute is ‘static’.
An example of how to set the vpcu element is given in the following listings:
numactl -H
available: 4 nodes (0-1,16-17)
node 0 cpus: 0 8 16 24 32
node 0 size: 65536 MB
node 0 free: 6668 MB
node 1 cpus: 40 48 56 64 72
node 1 size: 65536 MB
node 1 free: 8216 MB
node 16 cpus: 80 88 96 104 112
node 16 size: 65536 MB
node 16 free: 8205 MB
node 17 cpus: 120 128 136 144 152
node 17 size: 65536 MB
node 17 free: 6159 MB
node distances:
node 0 1 16 17
0: 10 20 40 40
1: 20 10 40 40
16: 40 40 10 20
17: 40 40 20 10
The host has 4 numa nodes each with 5 cores and 64 GB of memory.
Listing 3: Use numactl –H on the host to get host hardware configuration
Best practices guide for IBM Power Systems solution for MariaDB
19
<domain>
...
<vcpu>80</vcpu>
<vcpu placement='static' cpuset='80,88,96,104,112,120,128,136,144,152'</vcpu>
...
</domain>
Listing 4: vCPUs assigned to 10 cores on the second socket on the host
The ten physical cores (80,88,96,104,112,120,128,136,144,152) from the two numa nodes (16,17) on the host are assigned to the guest. Based on the definition of the topology of the guest described in , these are 10 virtual cores each with 8 (SMT) threads giving the guest a total of 80 vCPUs.
The vCPUs on the guest are numbered as 0,1..78,79. The 8 vCPUs that belong to a virtual core are
numbered consecutively (0..7 for example) and are co-scheduled on the same pysical core on the host.
Based on the ‘placement’ specification given above, the scheduler is allowed to assign the 8 vCPU set to
any of the 10 physical cores on the host
virsh dominfo p215vm135
Id: 8
Name: p215vm135
UUID: 6801ed05-65d0-4910-b459-7e2adaf2c971
OS Type: hvm
State: running
CPU(s): 80
CPU time: 306.3s
Max memory: 131072000 KiB
Used memory: 131072000 KiB
Listing 5: Run the virsh command on the host to confirm the guest has 80 vCPUs
Best practices guide for IBM Power Systems solution for MariaDB
20
numactl –H
available: 1 nodes (0)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52
53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79
node 0 size: 127856 MB
node 0 free: 126053 MB
node distances:
node 0
0: 10
Listing 6: numactl –H on the guest to get the guest view of the vCPUs allocated
CPU tuning
cputune
The content of this element, cptune, defines how each of the guest vCPUs can be allocated to the
physical processors on the host. This allows more fine-grained assignment of specific vCPUs than that
allowed by the element, vcpu. If both cputune and vcpu elements are specified, cputune takes
precedence over vCPU.
vcpupin
The optional vcpupin element specifies which of host's physical CPUs the domain vCPU will be pinned
to. You should pin vCPU "virtual core" groups to the same set of host CPUs.That is, if you set SMT
threads=8, pin a group of eight vCPUs to the same CPU set.
cpuset
The optional attribute cpuset is a comma-separated list of physical CPU numbers that domain process
and virtual CPUs can be pinned to by default. The pinning policy of domain process and virtual CPUs
specified separately by the element, cputune, overrides any pinning specified in the vcpu element.
emulatorpin
The optional emulatorpin element specifies which of the host physical CPUs the "emulator", a subset of
a domain not including vCPU (that is, everything in the domain besides the vCPUs, such as the vhost-net
process) will be pinned to.
An example of vcpupin and emulatorpin is given in the following listing. In this mapping, a guest vCPU
is restricted to a node on the host. Benchmark results show that such an allocation scheme can result in a
slightly better performance.
Recommendation: Use cputune to limit the movement of vCPUs to a single NUMA node on the host.
Best practices guide for IBM Power Systems solution for MariaDB
21
<domain>
...
<cputune>
<vcpupin vcpu='0' cpuset='80,88,96,104,112'/>
<vcpupin vcpu='1' cpuset='80,88,96,104,112'/>
<vcpupin vcpu='2' cpuset='80,88,96,104,112'/>
<vcpupin vcpu='3' cpuset='80,88,96,104,112'/>
<vcpupin vcpu='4' cpuset='80,88,96,104,112'/>
<vcpupin vcpu='5' cpuset='80,88,96,104,112'/>
<vcpupin vcpu='6' cpuset='80,88,96,104,112'/>
<vcpupin vcpu='7' cpuset='80,88,96,104,112'/>
<vcpupin vcpu='8' cpuset='80,88,96,104,112'/>
…
…
<vcpupin vcpu='72' cpuset='120,128,136,144,152'/>
<vcpupin vcpu='73' cpuset='120,128,136,144,152'/>
<vcpupin vcpu='74' cpuset='120,128,136,144,152'/>
<vcpupin vcpu='75' cpuset='120,128,136,144,152'/>
<vcpupin vcpu='76' cpuset='120,128,136,144,152'/>
<vcpupin vcpu='77' cpuset='120,128,136,144,152'/>
<vcpupin vcpu='78' cpuset='120,128,136,144,152'/>
<vcpupin vcpu='79' cpuset='120,128,136,144,152'/>
<emulatorpin cpuset='0,8,16,24,32,40,48,56,64,72,80'/>
</cputune>
...
</domain>
Listing 7: Detailed pinning of specific guest vVPUs and emulator to physical processors on the host
I/O tuning (virtio)
There are a few basic tuning options you can apply to your guest disk, whether it is backed by a block
device, a raw file, or a qcow2 file.
virtio
Virtio is a virtualization standard for network and disk device drivers where just the guest's device driver
"knows" it is running in a virtual environment, and cooperates with the hypervisor. This enables guests to
Best practices guide for IBM Power Systems solution for MariaDB
22
get high performance network and disk operations, and gives most of the performance benefits of
para virtualization.
For best performance, the virtio model is recommended over the Power Architecture Platform Reference
(PAPR) based SCSI) model or an emulated device because virtio has been tuned to perform well for KVM
environments. To set this, you need to set the disk bus type to virtio.
Recommendation: In the disk element section, set bus='virtio'.
cache
The following guest disk caching mode are supported:
Cache mode Description Performance impact
cache=writethrough Enables host cache; disables guest
cache
Improves reads, impacts writes
cache=none Disables host cache; enables guest
cache
Improves writes, impacts reads
cache=writeback Enable both host and guest caches I/O is improved, but data might be lost
Table 2: Options for the cache parameter
The cache=none mode is usually a good option in general for performance, unless read performance is
critical, in which case cache=writethrough might be the best.
Recommendation: In the disk element section, set cache=none.
io
Setting the I/O type can help further tune the guest disk device for best performance. io=native uses
kernel aio (async I/O) where io=threads uses host userspace threads, typically io=native provides
better performance.
Recommendation: In the disk element section, set io='native'.
An example of using bus, cache, and io options in the xml specification of guest is given in the following
listing.
<disk type='block' device='disk'>
<driver name='qemu' type='raw' cache='none' io='native'/>
<source dev='/dev/mapper/mpathb'/>
<target dev='vda' bus='virtio'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
</disk>
Listing 8: Example of guest I/O tuning
Best practices guide for IBM Power Systems solution for MariaDB
23
Memory tuning
The element, numatune, specifies the size, location, and allocation policy for the memory that is allocated
to the guest. The memory element specifies the host nodes that are allocated to the guest and how the
memory from these host nodes is allocated. MariaDB performs slightly better with memory mode set to
interleave. An example of setting the memory mode to interleave between two NUMA nodes is given in
the following listing.
Recommendation: In the numatune element section, set memory mode=’interleave’.
<domain>
...
<numatune>
<memory mode='interleave' nodeset='16-17'/>
</numatune>
...
</domain>
Listing 9: An example of using the numatune element
Network tuning
The virtio model provides better performance over the PAPR- based virtual LAN (VLAN) model or an
emulated device because virtio has been tuned to perform well for KVM environments. An example of how
to set the network model type to virtio is given in the following listing.
<interface type="bridge">
<source bridge='brnet1'/>
<target dev='guest001'/>
<mac address='AA:BB:CC:11:22:33'/>
<model type='virtio'/>
<alias name='net0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/>
</interface>
Listing 10: An example of network tuning
Sample guest XML configuration
In this section, a sample XML configuration file which contains the definition of a PowerKVM guest is
provided (refer to the following listing). This configuration can be used as a template and adjustments can
be made to suit a specific need. This configuration defines a guest with the following characteristics:
Host has two sockets, each socket with two NUMA nodes and each node with 5 cores.
The guest topology defines the allocation of two sockets and 10 cores per socket and the SMT is set to 8.
In this allocation, 160 vCPUs are allocated to the guest.
Best practices guide for IBM Power Systems solution for MariaDB
24
<address type='pci' reg='0x1000'/>
</interface>
<serial type='pty'>
<source path='/dev/pts/1'/>
<target type='isa-serial' port='0'/>
<alias name='serial0'/>
<address type='spapr-vio' reg='0x30001000'/>
</serial>
<console type='pty' tty='/dev/pts/1'>
<source path='/dev/pts/1'/>
<target type='serial' port='0'/>
<alias name='serial0'/>
<address type='spapr-vio' reg='0x30001000'/>
</console>
<input type='mouse' bus='usb'>
<alias name='input0'/>
</input>
<input type='keyboard' bus='usb'>
<alias name='input1'/>
</input>
<input type='tablet' bus='usb'>
<alias name='input2'/>
</input>
<graphics type='vnc' port='5900' autoport='yes' listen='0.0.0.0'>
<listen type='address' address='0.0.0.0'/>
</graphics>
<video>
<model type='vga' vram='9216' heads='1'/>
<alias name='video0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
</video>
<memballoon model='virtio'>
<alias name='balloon0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
</memballoon>
</devices>
<seclabel type='dynamic' model='selinux' relabel='yes'>
<label>system_u:system_r:svirt_t:s0:c422,c425</label>
<imagelabel>system_u:object_r:svirt_image_t:s0:c422,c425</imagelabel>
</seclabel>
</domain>
Best practices guide for IBM Power Systems solution for MariaDB
25
<on_crash>restart</on_crash>
<devices>
<emulator>/usr/bin/qemu-kvm</emulator>
<disk type='file' device='disk'>
<driver name='qemu' type='raw' cache='none'/>
<source file='/dev/VM_Storage/21513337-34a5-4a0b-bc83-e9e1451c3500-0.img'/>
<backingStore/>
<target dev='sda' bus='virtio'/>
<alias name='scsi0-0-0-0'/>
<address type='drive' controller='0' bus='0' target='0' unit='0'/>
</disk>
<disk type='file' device='cdrom'>
<driver name='qemu' type='raw'/>
<source file='/var/lib/libvirt/images/ubuntu-14.04.1-server-ppc64el.iso'/>
<backingStore/>
<target dev='sdc' bus='virtio' tray='open'/>
<readonly/>
<alias name='scsi0-0-0-2'/>
<address type='drive' controller='0' bus='0' target='0' unit='2'/>
</disk>
<disk type='block' device='disk'>
<driver name='qemu' type='raw' cache='none' io='native'/>
<source dev='/dev/mapper/mpathb'/>
<backingStore/>
<target dev='vda' bus='virtio'/>
<alias name='virtio-disk0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
</disk>
<controller type='usb' index='0'>
<alias name='usb0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/>
</controller>
<controller type='pci' index='0' model='pci-root'>
<alias name='pci.0'/>
</controller>
<controller type='scsi' index='0'>
<alias name='scsi0'/>
<addres'spapr-vio' reg='0x2000'/>
</controller>
<interface type='bridge'>
Best practices guide for IBM Power Systems solution for MariaDB
26
<mac address='52:54:00:74:89:c1'/>
<source bridge='brenP3p5s0f0'/>
<target dev='vnet0'/>
<model type='virtio'/>
<alias name='net0'/>
<domain type='kvm' id='20'>
<name>p215vm135</name>
<uuid>21513337-34a5-4a0b-bc83-e9e1451c3500</uuid>
<memory unit='KiB'>131072000</memory>
<currentMemory unit='KiB'>131072000</currentMemory>
<vcpu placement='static' cpuset='80,88,96,104,112,120,128,136,144,152'>80</vcpu>
<cputune>
<vcpupin vcpu='0' cpuset='80,88,96,104,112'/>
<vcpupin vcpu='1' cpuset='80,88,96,104,112'/>
<vcpupin vcpu='2' cpuset='80,88,96,104,112'/>
<vcpupin vcpu='3' cpuset='80,88,96,104,112'/>
<vcpupin vcpu='4' cpuset='80,88,96,104,112'/>
<vcpupin vcpu='5' cpuset='80,88,96,104,112'/>
<vcpupin vcpu='6' cpuset='80,88,96,104,112'/>
<vcpupin vcpu='7' cpuset='80,88,96,104,112'/>
<vcpupin vcpu='8' cpuset='80,88,96,104,112'/>
…
…
<vcpupin vcpu='72' cpuset='120,128,136,144,152'/>
<vcpupin vcpu='73' cpuset='120,128,136,144,152'/>
<vcpupin vcpu='74' cpuset='120,128,136,144,152'/>
<vcpupin vcpu='75' cpuset='120,128,136,144,152'/>
<vcpupin vcpu='76' cpuset='120,128,136,144,152'/>
<vcpupin vcpu='77' cpuset='120,128,136,144,152'/>
<vcpupin vcpu='78' cpuset='120,128,136,144,152'/>
<vcpupin vcpu='79' cpuset='120,128,136,144,152'/>
<emulatorpin cpuset='80,88,96,104,112,120,128,136,144,152'/>
</cputune>
<numatune>
<memory mode='interleave' nodeset='16-17'/>
</numatune>
<resource>
<partition>/machine</partition>
</resource>
<os>
<type arch='ppc64' machine='pseries-2.2'>hvm</type>
Best practices guide for IBM Power Systems solution for MariaDB
27
<boot dev='hd'/>
<boot dev='cdrom'/>
</os>
<features>
<acpi/>
<apic/>
<pae/>
</features>
<cpu>
<topology sockets='1' cores='10' threads='8'/>
</cpu>
<clock offset='utc'/>
<on_poweroff>destroy</on_poweroff>
<on_reboot>restart</on_reboot>
Listing 11: Sample PowerKVM configuration file to set up a KVM guest with 80 vCPUs
Summary
This paper described the deployment and tuning of MariaDB on Linux on Power and PowerKVM. The
interactions between the application and the system components are very complex and the guidelines
provided in this paper simplify the task of improving the performance of MariaDB running on Linux on
Power and PowerKVM. References are provided in the following section for more detailed treatment of
these topics.
Best practices guide for IBM Power Systems solution for MariaDB
28
Acknowledgements
Thanks to the following people for their contributions to this paper.
Jenifer Hopper is a performance analyst in IBM Systems and Technology Linux Technology Center. You
can reach Jenifer at [email protected],com.
Mark Nellen is a program manager in IBM Systems and Technology Group, ISV Enablement organization.
You can reach Mark at [email protected].
Maya Pandya is a technology manager in IBM Systems and Technology Group, ISV Enablement
organization. You can reach Maya at [email protected].
Michael (Monty) Widenius is an advisor and a board member in MariaDB Corporation and is also a
member of the board of directors in MariaDB. You can reach Monty at [email protected].
Basu Vaidyanathan is a performance analyst in IBM Systems and Technology Group Performance
Analysis organization. You can reach Basu at [email protected].
Best practices guide for IBM Power Systems solution for MariaDB
29
Resources
The following websites provide useful references to supplement the information contained in this paper:
MariaDB Foundation
www.MariaDB.org/
MariaDB Corporation official website
www.MariaDB.com/
IBM Systems on PartnerWorld
ibm.com/partnerworld/systems
IBM Power Systems
ibm.com/systems/in/power/?lnk=mhpr
IBM Linux on Power – resources
ibm.com/systems/power/software/linux/resources.html
IBM Power Systems hardware documentation
http://publib.boulder.ibm.com/infocenter/powersys/v3r1m5/index.jsp
Best practices guide for IBM Power Systems solution for MariaDB
30
About the authors
This paper is the result of a collaborative effort by MariaDB Corporation, MariaDB Foundation, and IBM.
Hari Reddy is a technical consultant in IBM Systems and Technology Group ISV Enablement
organization. You can reach Hari at [email protected].
Axel Schwenke is a performance analyst in MariaDB Corporation. You can reach Axel at
Sergey Vojtovich is a software developer in the MariaDB Foundation. You can reach Sergey reach at
Best practices guide for IBM Power Systems solution for MariaDB
31
Trademarks and special notices
© Copyright IBM Corporation 2014.
References in this document to IBM products or services do not imply that IBM intends to make them
available in every country. Information is provided "AS IS" without warranty of any kind.
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business
Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked
terms are marked on their first occurrence in this information with a trademark symbol (® or ™), these
symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information
was published. Such trademarks may also be registered or common law trademarks in other countries. A
current list of IBM trademarks is available on the Web at "Copyright and trademark information" at
www.ibm.com/legal/copytrade.shtml.
Linux is a trademark of Linus Torvalds in the United States, other countries, or both.
Other company, product, or service names may be trademarks or service marks of others.
Information concerning non-IBM products was obtained from a supplier of these products, published
announcement material, or other publicly available sources and does not constitute an endorsement of
such products by IBM. Sources for non-IBM list prices and performance numbers are taken from publicly
available information, including vendor announcements and vendor worldwide homepages. IBM has not
tested these products and cannot confirm the accuracy of performance, capability, or any other claims
related to non-IBM products. Questions on the capability of non-IBM products should be addressed to the
supplier of those products.
Some information addresses anticipated future capabilities. Such information is not intended as a definitive
statement of a commitment to specific levels of performance, function or delivery schedules with respect to
any future products. Such commitments are only made in IBM product announcements. The information is
presented here to communicate IBM's current investment and development activities as a good faith effort
to help with our customers' future planning.
Performance is based on measurements and projections using standard IBM benchmarks in a controlled
environment. The actual throughput or performance that any user will experience will vary depending upon
considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the
storage configuration, and the workload processed. Therefore, no assurance can be given that an
individual user will achieve throughput or performance improvements equivalent to the ratios stated here.
Any references in this information to non-IBM websites are provided for convenience only and do not in
any manner serve as an endorsement of those websites. The materials at those websites are not part of
the materials for this IBM product and use of those websites is at your own risk.