linux io internals for database administrators (scale 2017 and pgday nordic 2017 version)

19
Linux IO internals for database administrators Ilya Kosmodemiansky [email protected]

Upload: postgresql-consulting

Post on 21-Apr-2017

125 views

Category:

Engineering


0 download

TRANSCRIPT

Page 1: Linux IO internals for database administrators (SCaLE 2017 and PGDay Nordic 2017 version)

Linux IO internalsfor database administrators

Ilya [email protected]

Page 2: Linux IO internals for database administrators (SCaLE 2017 and PGDay Nordic 2017 version)

1 Why this talk

• Linux is a most common OS for databases• DBAs often run into IO problems• Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style

• Checklists are useful, but up to certain workload

dataegret.com

Why this talk

• Linux is a most common OS for databases

• DBAs often run into IO problems

• Most of the information on topic is written by kerneldevelopers(for kernel developers) or is checklist-style

• Checklists are useful, but up to certain workload

dataegret.com

Page 3: Linux IO internals for database administrators (SCaLE 2017 and PGDay Nordic 2017 version)

2 The main IO problem for databases

• How to maximize page throughput between memory anddisks

• Things involved:I DisksI MemoryI CPUI IO SchedulersI FilesystemsI Database itself

• IO problems for databases are not always only about disks

dataegret.com

Why this talk

• Linux is a most common OS for databases

• DBAs often run into IO problems

• Most of the information on topic is written by kerneldevelopers(for kernel developers) or is checklist-style

• Checklists are useful, but up to certain workload

dataegret.com

Page 4: Linux IO internals for database administrators (SCaLE 2017 and PGDay Nordic 2017 version)

3 A typical database

DRAM

Disks

Shared memory

Database

Linux

Page cache

User space

Kernel space

WAL bu�er

WAL Data�le

dataegret.com

Why this talk

• Linux is a most common OS for databases

• DBAs often run into IO problems

• Most of the information on topic is written by kerneldevelopers(for kernel developers) or is checklist-style

• Checklists are useful, but up to certain workload

dataegret.com

Page 5: Linux IO internals for database administrators (SCaLE 2017 and PGDay Nordic 2017 version)

4 Key things about such workload

• Shared memory segment can be very large• Keeping in-memory pages synchronized with disk generateshuge IO

• WAL should be written fast and safe• One and every layer of OS IO stack involved

dataegret.com

Why this talk

• Linux is a most common OS for databases

• DBAs often run into IO problems

• Most of the information on topic is written by kerneldevelopers(for kernel developers) or is checklist-style

• Checklists are useful, but up to certain workload

dataegret.com

Page 6: Linux IO internals for database administrators (SCaLE 2017 and PGDay Nordic 2017 version)

5 Memory allocation and mapping

CPU L1

MMU TLB

Memory

L2 L3

pagetable

Virtual addressing

Translation

Physical addressing

dataegret.com

Why this talk

• Linux is a most common OS for databases

• DBAs often run into IO problems

• Most of the information on topic is written by kerneldevelopers(for kernel developers) or is checklist-style

• Checklists are useful, but up to certain workload

dataegret.com

Page 7: Linux IO internals for database administrators (SCaLE 2017 and PGDay Nordic 2017 version)

6 DBAs takeaways:

• Database with huge shared memory segment benefits fromhuge pages

• But not from transparent huge pagesI Databases operate large continuous shared memory segmentsI THP defragmentation can lead to severe performance

degradation in such cases

dataegret.com

Why this talk

• Linux is a most common OS for databases

• DBAs often run into IO problems

• Most of the information on topic is written by kerneldevelopers(for kernel developers) or is checklist-style

• Checklists are useful, but up to certain workload

dataegret.com

Page 8: Linux IO internals for database administrators (SCaLE 2017 and PGDay Nordic 2017 version)

7 Freeing memory

Page cache

Swap

Free list Free list Free list

alloc()free()

page-out

vm.min_free_kbytes

reclaim

vm.swapiness

OOM-killer

vm.panic_on_oom

dataegret.com

Why this talk

• Linux is a most common OS for databases

• DBAs often run into IO problems

• Most of the information on topic is written by kerneldevelopers(for kernel developers) or is checklist-style

• Checklists are useful, but up to certain workload

dataegret.com

Page 9: Linux IO internals for database administrators (SCaLE 2017 and PGDay Nordic 2017 version)

8 Page-out

• Page-out happens if:I someone calls fsyncI 30 sec timeout exceeded (vm.dirty_expire_centisecs)I Too many dirty pages (vm.dirty_background_ratio and

vm.dirty_ratio)

• It is reasonable to tune vm.dirty_* on rotating disks withRAID controller, but helps a little on server class SSDs

dataegret.com

Why this talk

• Linux is a most common OS for databases

• DBAs often run into IO problems

• Most of the information on topic is written by kerneldevelopers(for kernel developers) or is checklist-style

• Checklists are useful, but up to certain workload

dataegret.com

Page 10: Linux IO internals for database administrators (SCaLE 2017 and PGDay Nordic 2017 version)

9 DBAs takeaways:

• vm.overcommit_memoryI 0 - heuristic overcommit, reduces swap usageI 1 - always overcommitI 2 - do not overcommit (vm.overcommit_ratio = 50 by

default)

• vm.min_free_kbytes - reasonably high (can be easy 1000000on a server with enough memory)

• vm.swappiness = 1I 0 - swap disabledI 60 - defaultI 100 - swap preffered instead of other reaping mechanisms

• Your database would not like OOM-killer

dataegret.com

Why this talk

• Linux is a most common OS for databases

• DBAs often run into IO problems

• Most of the information on topic is written by kerneldevelopers(for kernel developers) or is checklist-style

• Checklists are useful, but up to certain workload

dataegret.com

Page 11: Linux IO internals for database administrators (SCaLE 2017 and PGDay Nordic 2017 version)

10 OOM-killer: plague and cholera

• vm.panic_on_oom effectively disables OOM-killer, but that isprobably not the result you desire

• Or for a certain process: echo -17 > /proc/12465/oom_adjbut again

dataegret.com

Why this talk

• Linux is a most common OS for databases

• DBAs often run into IO problems

• Most of the information on topic is written by kerneldevelopers(for kernel developers) or is checklist-style

• Checklists are useful, but up to certain workload

dataegret.com

Page 12: Linux IO internals for database administrators (SCaLE 2017 and PGDay Nordic 2017 version)

11 Filesystems: write barriers

Journal

Kernel bu�er

Journalated fylesystem

Data

Journal entryData

dataegret.com

Why this talk

• Linux is a most common OS for databases

• DBAs often run into IO problems

• Most of the information on topic is written by kerneldevelopers(for kernel developers) or is checklist-style

• Checklists are useful, but up to certain workload

dataegret.com

Page 13: Linux IO internals for database administrators (SCaLE 2017 and PGDay Nordic 2017 version)

12 DBAs takeaways:

• ext4 or xfs• Disable write barrier (only if SSD/controller cache is protectedby capacitor/battery)

dataegret.com

Why this talk

• Linux is a most common OS for databases

• DBAs often run into IO problems

• Most of the information on topic is written by kerneldevelopers(for kernel developers) or is checklist-style

• Checklists are useful, but up to certain workload

dataegret.com

Page 14: Linux IO internals for database administrators (SCaLE 2017 and PGDay Nordic 2017 version)

13 IO stack

Database memory

Page cacheVFSEXT4

Block device interfaceDisks

Direct IO

BIO Layer

Block IO

Request Layer Elevator/IO Scheduler

dataegret.com

Why this talk

• Linux is a most common OS for databases

• DBAs often run into IO problems

• Most of the information on topic is written by kerneldevelopers(for kernel developers) or is checklist-style

• Checklists are useful, but up to certain workload

dataegret.com

Page 15: Linux IO internals for database administrators (SCaLE 2017 and PGDay Nordic 2017 version)

14 Elevators: before 2.6 kernel

• Linus Elevator - the only one in times of 2.4• merging and sorting request queues• Had lots of problems• The Linux Scheduler: a Decade of Wasted Cores – Lozi et al.2016

dataegret.com

Why this talk

• Linux is a most common OS for databases

• DBAs often run into IO problems

• Most of the information on topic is written by kerneldevelopers(for kernel developers) or is checklist-style

• Checklists are useful, but up to certain workload

dataegret.com

Page 16: Linux IO internals for database administrators (SCaLE 2017 and PGDay Nordic 2017 version)

15 Elevators: between 2.6 and early 3.*

• CFQ - universal, default one• deadline - rotating disks• noop or none - then disks throughput is so high, that it cannot benefit from keen scheduling

I PCIe SSDsI SAN disk arrays

dataegret.com

Why this talk

• Linux is a most common OS for databases

• DBAs often run into IO problems

• Most of the information on topic is written by kerneldevelopers(for kernel developers) or is checklist-style

• Checklists are useful, but up to certain workload

dataegret.com

Page 17: Linux IO internals for database administrators (SCaLE 2017 and PGDay Nordic 2017 version)

16 Elevators: 3.13 and newer

• Effectiveness of noop clearly shows ineffectiveness of others,or ineffectiveness of smart sorting as an approach

• blk-mq scheduler was merged into 3.13 kernel• Much better deals with parallelism of modern SSD - basicallyseparate IO queue for each CPU

• The best option for good SSDs right now

dataegret.com

Why this talk

• Linux is a most common OS for databases

• DBAs often run into IO problems

• Most of the information on topic is written by kerneldevelopers(for kernel developers) or is checklist-style

• Checklists are useful, but up to certain workload

dataegret.com

Page 18: Linux IO internals for database administrators (SCaLE 2017 and PGDay Nordic 2017 version)

17 Thanks

to my collegues Alexey Lesovsky and Max Boguk for a lot ofresearch on this topic

dataegret.com

Why this talk

• Linux is a most common OS for databases

• DBAs often run into IO problems

• Most of the information on topic is written by kerneldevelopers(for kernel developers) or is checklist-style

• Checklists are useful, but up to certain workload

dataegret.com

Page 19: Linux IO internals for database administrators (SCaLE 2017 and PGDay Nordic 2017 version)

18 Questions?

[email protected]

dataegret.com

Why this talk

• Linux is a most common OS for databases

• DBAs often run into IO problems

• Most of the information on topic is written by kerneldevelopers(for kernel developers) or is checklist-style

• Checklists are useful, but up to certain workload

dataegret.com