linux io internals for database administrators (scale 2017 and pgday nordic 2017 version)
TRANSCRIPT
Linux IO internalsfor database administrators
Ilya [email protected]
1 Why this talk
• Linux is a most common OS for databases• DBAs often run into IO problems• Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
2 The main IO problem for databases
• How to maximize page throughput between memory anddisks
• Things involved:I DisksI MemoryI CPUI IO SchedulersI FilesystemsI Database itself
• IO problems for databases are not always only about disks
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
3 A typical database
DRAM
Disks
Shared memory
Database
Linux
Page cache
User space
Kernel space
WAL bu�er
WAL Data�le
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
4 Key things about such workload
• Shared memory segment can be very large• Keeping in-memory pages synchronized with disk generateshuge IO
• WAL should be written fast and safe• One and every layer of OS IO stack involved
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
5 Memory allocation and mapping
CPU L1
MMU TLB
Memory
L2 L3
pagetable
Virtual addressing
Translation
Physical addressing
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
6 DBAs takeaways:
• Database with huge shared memory segment benefits fromhuge pages
• But not from transparent huge pagesI Databases operate large continuous shared memory segmentsI THP defragmentation can lead to severe performance
degradation in such cases
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
7 Freeing memory
Page cache
Swap
Free list Free list Free list
alloc()free()
page-out
vm.min_free_kbytes
reclaim
vm.swapiness
OOM-killer
vm.panic_on_oom
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
8 Page-out
• Page-out happens if:I someone calls fsyncI 30 sec timeout exceeded (vm.dirty_expire_centisecs)I Too many dirty pages (vm.dirty_background_ratio and
vm.dirty_ratio)
• It is reasonable to tune vm.dirty_* on rotating disks withRAID controller, but helps a little on server class SSDs
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
9 DBAs takeaways:
• vm.overcommit_memoryI 0 - heuristic overcommit, reduces swap usageI 1 - always overcommitI 2 - do not overcommit (vm.overcommit_ratio = 50 by
default)
• vm.min_free_kbytes - reasonably high (can be easy 1000000on a server with enough memory)
• vm.swappiness = 1I 0 - swap disabledI 60 - defaultI 100 - swap preffered instead of other reaping mechanisms
• Your database would not like OOM-killer
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
10 OOM-killer: plague and cholera
• vm.panic_on_oom effectively disables OOM-killer, but that isprobably not the result you desire
• Or for a certain process: echo -17 > /proc/12465/oom_adjbut again
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
11 Filesystems: write barriers
Journal
Kernel bu�er
Journalated fylesystem
Data
Journal entryData
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
12 DBAs takeaways:
• ext4 or xfs• Disable write barrier (only if SSD/controller cache is protectedby capacitor/battery)
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
13 IO stack
Database memory
Page cacheVFSEXT4
Block device interfaceDisks
Direct IO
BIO Layer
Block IO
Request Layer Elevator/IO Scheduler
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
14 Elevators: before 2.6 kernel
• Linus Elevator - the only one in times of 2.4• merging and sorting request queues• Had lots of problems• The Linux Scheduler: a Decade of Wasted Cores – Lozi et al.2016
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
15 Elevators: between 2.6 and early 3.*
• CFQ - universal, default one• deadline - rotating disks• noop or none - then disks throughput is so high, that it cannot benefit from keen scheduling
I PCIe SSDsI SAN disk arrays
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
16 Elevators: 3.13 and newer
• Effectiveness of noop clearly shows ineffectiveness of others,or ineffectiveness of smart sorting as an approach
• blk-mq scheduler was merged into 3.13 kernel• Much better deals with parallelism of modern SSD - basicallyseparate IO queue for each CPU
• The best option for good SSDs right now
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
17 Thanks
to my collegues Alexey Lesovsky and Max Boguk for a lot ofresearch on this topic
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
18 Questions?
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com