linux internals for mysql dbas - percona · pdf filelinux internals for mysql dbas ryan lowe...
TRANSCRIPT
Linux Internals For MySQL DBAs
Ryan Lowe Marcos Albe Chris Giard
Daniel Nichter Syam Purnam
Emily Slocombe Le Peter Boros
Linux Kernel• It’s big (almost 20 million lines of code)
• It’ll take you YEARS to be an expert
• Resources:
• linuxfromscratch.org
• kernel.org
• man pages
• lwn.net
/proc• Processes and other system information in a
hierarchical file-like structure
• Interaction between kernel space and user space
• Exposes kernel knobs and sliders
• Plain text files
• echo 'value' > /proc/you/want/adjusted
ulimits
• Considered part of security system
• For performance/operations we only care about open files (-n)
• For debugging might need to set core dump size limit to Unlimited (-c)
Virtual File System (VFS)
• Abstraction layer to allow Linux to handle many filesystems.
• Provides a common interface to make your life easier.
• Introduces the Common File Model
ext2/3/4• ext2: Old; No Journaling (SSD/Flash, maybe)
• ext3: ext2 + journaling + HTree Indexing
• ext4
• Large Volumes
• Extents
• Checksummed Journal
• Nanosecond Timestamps
XFS
• Standard recommendation for DB workloads
• Highly proficient with parallel IO
• Current (started 1998, but development is active)
AIO
A method for performing IO operations so that the process that issued an IO request is not blocked till
the data is available.
Instead, after an IO request is submitted, the process continues to execute its code and can later check the
status of the submitted request.
IO Schedulers• noop
• FIFO Queue w/Request Merging
• deadline
• Impose a deadline on all operations
• cfq
• Completely Fair Queueing
• anticipatory
• “Anticipates” synchronous read operations
Storage: Disks• Rotational Latency: The delay waiting for the rotation of
the disk to bring the required disk sector under the read-write head.
• Seek time: Time to move the Read/Write Head from current position to the desired track location
• Access time (Response time): How fast we can locate a position of a file
• Transfer time (Throughput): How fast we can get bytes from disk to RAM
IOPS Example• Model: Western Digital VelociRaptor 2.5" SATA hard
drive
• Rotational speed: 10,000 RPM
• Average latency: 3 ms (0.003 seconds)
• Average seek time: 4.2 (r)/4.7 (w) = 4.45 ms (0.0045 seconds)
• Calculated IOPS for this disk: 1/(0.003 + 0.0045) = about 133 IOPS
RAID Alignment
http://www.mysqlperformanceblog.com/2011/06/09/aligning-io-on-a-hard-disk-raid-the-theory/
InnoDB Buffer Pool Size
• Allocated Dynamically
• unless you use innodb_buffer_pool_populate
• InnoDB checks on boot if enough RAM is available
• 10% overhead
Huge Pages• Translation Lookaside Buffer (TLB)
• Default 4k page size
• Larger pages = Smaller TLB
• Huge Pages:
• 2MB - commodity HW
• 1GB - High End (1TB+)
• Huge Pages are good for huge Buffer Pool
NUMA• Each physical die is a NUMA node
• introspection: numactl --hardware / numactl --show
O(log(N))O(1) for local
O(log(N)) for remote
NUMA• Disable MySQL NUMA affinity / pinning except:
• When you run multiple mysqld instances on the same system
• AND you have data for each instance separated on different PCIe cards
• AND the PCIe cards are local to different CPU sockets
• Percona implements Jeremy Cole’s suggestions from http://blog.jcole.us/2012/04/16/a-brief-update-on-numa-and-mysql/
CPU Governors• Control CPU speed and power consumption
• # cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors
• control through /proc (or /etc/init.d/cpuspeed or cpupower)
• More: http://forum.xda-developers.com/showthread.php?t=1736168
• Most distros use on-demand by default (/etc/defaults/cpupower)
• We care about on-demand, performance, powersave. We usually want performance;
Backlog• Queue for new TCP connections
• MySQL: back_log
• Linux: tcp_max_syn_backlog
• sysctl -w net.ipv4.tcp_max_syn_backlog = 4096
• sysctl -w net.core.somaxconn = 1024
Slow Links• /proc/net/core/wmem_max = 104857600
• /proc/net/core/rmem_max = 104857600
• /proc/net/ipv4/tcp_wmem = 4096 86400 66060288
• /proc/net/ipv4/tcp_rmem = 8192 86400 66060288
• /proc/net/ipv4/tcp_mem = 104857600 104857600 104857600
• /proc/net/ipv4/tcp_window_scaling = 1
• /proc/net/ipv4/tcp_sack = 1
• /proc/net/ipv4/tcp_timestamps = 1
• /proc/net/ipv4/tcp_no_metrics_save = 0
• /proc/net/ipv4/tcp_moderate_rcvbuf = 1
High QPS• /proc/net/ipv4/ip_local_port_range = 15000 61000
• /proc/net/ipv4/tcp_max_tw_buckets = 2000000
• /proc/net/ipv4/tcp_tw_reuse = 1
• /proc/net/ipv4/tcp_syncookies = 1
• /proc/net/core/wmem_default = 135168
• /proc/net/core/rmem_default = 135168
• /proc/net/ipv4/tcp_wmem = 4096 86384 104857600
• /proc/net/ipv4/tcp_rmem = 8192 86384 104857600
• /proc/net/ipv4/tcp_mem = 104857600 104857600 104857600
• /proc/net/core/rmem_max = 104857600
• /proc/net/core/wmem_max = 104857600