deploying mongodb in production - percona · tuning mongodb: wiredtiger wt syncs data to disk in a...

115
Deploying MongoDB in Production Monday, November 5, 2018 9:00 AM - 12:00 PM Bull

Upload: others

Post on 21-May-2020

16 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Deploying MongoDB in Production

Monday, November 5, 2018 9:00 AM - 12:00 PM Bull

Page 2: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

4

About us

Page 3: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Agenda

● Hardware and OS configuration● MongoDB in Production● Backups and Monitoring● Q&A

5

Page 4: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

6

● Data○ Document: single *SON object, often nested○ Field: single field in a document○ Collection: grouping of documents○ Database: grouping of collections○ Capped Collection: A fixed-size FIFO collection

● Replication○ Oplog: A special capped collection for replication○ Primary: A replica set node that can receive writes○ Secondary: A replica of the Primary that is read-only○ Voting: A process to elect a Primary node○ Hidden-Secondary: A replica that cannot become Primary

Terminology

Page 5: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Common database architecture

Page 6: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

On site (local DC)

● Buy the biggest machine possible with a lot of fancy disks and memory.● Retire the equipment 5 years later because they are outdated.● Open source was available but also proprietary softwares were very present in

companies.● Buy fiber links, more than one for safety.● Huge and heavy UPS ● Routers configuration

8

Page 7: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

On the public cloud

● Rent or reserve a machine according to the needs.● No upfront investment● No huge hydro bills ● No need to setup cables and buy link

At the end someone else did that for you.

9

Page 8: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

On the private cloud

● Considerable upfront● Still need to configure the hardware and buy links and disks.● Abstraction layer to configure virtual machines usually using a platform.

10

Page 9: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Beyond the cloud

● Use the database as a service.● Scale up and down with a few commands● Doesn't matter where the service is running anymore● Docker/Mesosphere/Kubernetes and a lot more...

11

Page 10: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Hardware configuration

Page 11: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Hardware configuration

● Not all of those options are available in cloud, private and services.

13

Page 12: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Disks

Page 13: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Disks

● Crucial resource to any database or system.● Databases needs disk to persist the data.

● Options available:

Magnetic disks

SSD/nvMe

15

Page 14: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

● What is RAID?

● If using RAID prefer 1+0

● Avoid RAID5 and RAID6 (write performance penalty)

Disk Configuration

16

Page 15: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Disks - Configuration

Performance: High

Redundancy: None

Overhead/parity : None

17

Page 16: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Disks - Configuration

Performance: Low

Redundancy: Yes

Overhead/parity : Yes

18

Page 17: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Disks - Configuration

Performance: High

Redundancy: None

Overhead/parity : Yes

19

Page 18: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Disks - Configuration

Performance: High

Redundancy: Yes

Overhead/parity : Yes

source: http://www.icc-usa.com/raid-calculator.html20

Page 19: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Disks - Configuration

For cloud services local storage offers best throughput/price but remember, once the machine is restarted, all the data is erased.

It is not common to see RAID 10 in cloud environments, replica-sets keeps the same data across different nodes and in case of failure all we need is to start a new box.

21

Page 20: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Good Practices:

● Use different disk for the data folder.● If possible move the journal to a different disk.● SSD will give a better performance than spinning disks.● If using EBS consider io2 family to guarantee IOPS

Disks - Configuration

22

Page 21: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Warnings:

● EBS without PIOPS may reach out the IOPS limits in the middle of the business day slowing down the application.

● Do not share the same disks for replica-sets in a storage.● NFS/remote disks will slow down the database and tend to have more issues

than local/fiber connected disks

Disks - Configuration

23

Page 22: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Disk Scheduler

● Disk scheduler may affect the database performance.

● Most common disk schedulers are:

NOOP

DEADLINE

CFQ

24

Page 23: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Disk Scheduler - NOOP

First Come First Serve

25

Page 24: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Disk Scheduler - Deadline

Reads have priority. Writes are queued as that can happen asynchronous.

This scheduler try to speed up reads as the application may need the data to return results to the client.

26

Page 25: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Disk Scheduler - Complete Fair Queue

Time slide per process

27

Page 26: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Disk Scheduler

28

Page 27: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Disk Filesystems

● Filesystem Types○ Use XFS or EXT4

■ Use XFS only on WiredTiger■ EXT4 “data=ordered” mode recommended

○ Btrfs not tested, yet!

● Filesystem Options○ Set ‘noatime’ on MongoDB data volumes in ‘/etc/fstab’:○

○ Remount the filesystem after an options change, or reboot

29

Page 28: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Disk Readahead

● Spinning disks may be slow to read data

● Setting readahead may improve the read performance at the cost of may loading data into memory that will never be used

● We recommend '32' blocks read-ahead (16k)

30

Page 29: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Disk Readahead

● Change ReadAhead adding file to ‘/etc/udev/rules.d’/etc/udev/rules.d/60-mongodb-disk.rules:

# set deadline scheduler and 32/16kb read-ahead for /dev/sda

ACTION=="add|change", KERNEL=="sda", ATTR{queue/scheduler}="deadline", ATTR{bdi/read_ahead_kb}="16"

● Orsudo blockdev --getra and sudo blockdev --setra

31

Page 30: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Processor

Page 31: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Hardware: CPUs

● Cores vs Core Speed

○ Faster cores doesn't necessarily mean a faster database.○ Almost all the databases takes advantage of multi cores for a good performance.

33

Page 32: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

● CPU Frequency Scaling - Power Save

● Check if the database server is not configured with power saving profile.

● This configuration may change the processor performance and also the database performance.

Hardware: CPUs - Checks

34

Page 33: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Hardware: CPUs - vCPU

● If using virtual machines check the vCPU speed.● Some public cloud set the maximum speed to something like 1.2GHZ● Avoid overcommitment of resources, memory and CPU

35

Page 34: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Memory

Page 35: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Tuning Linux: NUMA

● A memory architecture that takes into account the locality of memory, caches and CPUs for lower latency

● MongoDB codebase is not NUMA “aware”, causing unbalanced memory allocations on NUMA systems

● Disable NUMA○ In the Server BIOS ○ Using ‘numactl’ in init scripts BEFORE ‘mongod’ command (recommended for future

compatibility):

numactl --interleave=all /usr/bin/mongod <other flags>

37

Page 36: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Tuning Linux: NUMA

38

Page 37: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Tuning Linux: Transparent HugePages

● Introduced in RHEL/CentOS 6, Linux 2.6.38+● Merges memory pages in background (Khugepaged process)● Decreases overall performance when used with MongoDB!● “AnonHugePages” in /proc/meminfo shows usage● Disable TransparentHugePages!● Add “transparent_hugepage=never” to kernel command-line (GRUB)

39

Page 38: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Kernel

Page 39: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Tuning Linux: The Linux Kernel

● Linux 2.6.x?● Avoid Linux earlier than 3.10.x - 3.12.x● Large improvements in parallel efficiency in 3.10+ (for Free!)

More: https://blog.2ndquadrant.com/postgresql-vs-kernel-versions/

41

Page 40: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

● Allows per-Linux-user resource constraints○ Number of User-level Processes○ Number of Open Files○ CPU Seconds○ Scheduling Priority

● MongoDB○ Should probably have a dedicated VM, container or server○ Creates a new process○ Creates an open file for each active data file on disk

Tuning Linux: Ulimit

42

Page 41: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Tuning Linux: Swappiness

● A Linux kernel sysctl setting for preferring RAM or disk for swap○ Linux default: 60○ To avoid disk-based swap: 1 (not zero!)○ To allow some disk-based swap: 10○ ‘0’ can cause more swapping than ‘1’ on recent kernels

More on this here: https://www.percona.com/blog/2014/04/28/oom-relation-vm-swappiness0-new-kernel/

43

Page 42: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Tuning Linux: Ulimit

● Setting ulimits○ /etc/security/limits.d file○ Systemd Service○ Init script

● Ulimits are set by Percona and MongoDB packages!○ Example on left: PSMDB RPM (Systemd)

44

Page 43: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Tuning Linux: Time Source

● Replication and Clustering needs consistent clocks (before 3.6)○ mongodb_consistent_backup relies on time sync, for example!

● Use a consistent time source/server○ “It’s ok if everyone is equally wrong”

● Non-Virtualized○ Run NTP daemon on all MongoDB and Monitoring hosts○ Enable service so it starts on reboot

● Virtualised○ Check if your VM platform has an “agent” syncing time○ VMWare and Xen are known to have their own time sync○ If no time sync provided install NTP daemon

45

Page 44: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Tuning Linux: Time Source

46

Page 45: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Network

Page 46: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Tuning Linux: Network Stack

● Defaults are not good for > 100mbps Ethernet● Suggested starting point:

● Set Network Tunings:○ Add the above sysctl tunings to /etc/sysctl.conf○ Run “/sbin/sysctl -p” as root to set the tunings○ Run “/sbin/sysctl -a” to verify the changes

● Listen Backlog = 128 (Mongo Parameter)

48

Page 47: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Hardware: Network Infrastructure

● Datacenter Tiers

○ Network Edge○ Public Server VLAN○ Backend Server VLAN○ Data VLAN

49

Page 48: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Hardware: Network Infrastructure

● Network Fabric○ Try to use 10GBe for low latency○ Use Jumbo Frames for efficiency○ Try to keep all MongoDB nodes on the same segment

● Outbound / Public Access○ Databases don’t need to talk to the internet*

● Cloud?○ Try to replicate the above with features of your provider

50

Page 49: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Tuning Linux: More on this...

https://www.percona.com/blog/2016/08/12/tuning-linux-for-mongodb/

51

Page 50: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Storage Engine and Installation

Page 51: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Tuning MongoDB: WiredTiger

● WT syncs data to disk in a process called “Checkpointing”:○ Every 60 seconds or >= 2GB data changes

● In-memory buffering of Journal○ Journal buffer size 128kb○ Synced every 50 ms (as of 3.2)○ Or every change with Journaled write concern ○ In between write operations while the journal records remain in the buffer, updates can be

lost following a hard shutdown!

53

Page 52: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Tuning MongoDB: Storage Engine Caches

● WiredTiger○ In heap

~ 50% available system memory

Uncompressed WT pages

○ Filesystem Cache

~ 50% available system memory

Compressed pages

54

Page 53: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Tuning MongoDB: Durability

55

● WiredTiger - Default since 3.2

● storage.journal.enabled = <true/false>○ Always enable unless data is transient - default true○ Always enable on cluster config servers

● storage.journal.commitIntervalMs = <ms>○ Max time between journal syncs - default 100ms

● storage.syncPeriodSecs = <secs>○ Max time between data file flushes - default 60 seconds

Page 54: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Security

“Think of the network like a public place” ~ Unknown

Page 55: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Security: Authorization

● Always enable auth on Production Installs!● Do not use weak passwords● Minimum access policy

57

Page 56: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Default Roles

● read

● readWrite

● dbAdmin

● dbOwner

● userAdmin

● clusterAdmin

● clusterMonitor

● clusterManager

● hostManager

● backup

● restore

● readAnyDatabase

● readWriteAnyDatabase

● userAdminAnyDatabase

● dbAdminAnyDatabase

● root

● __system

58

Page 57: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Security: Filesystem Access

● Use a service user+group○ ‘mongod’ or ‘mongodb’ on most systems○ Ensure data path, log file and key file(s) are owned by this user+group

● Data Path○ Mode: 0750

● Log File○ Mode: 0640○ Contains real queries and their fields!

● Key File(s)○ Files Include: keyFile and SSL certificates or keys○ Mode: 0600

59

Page 58: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Security: Network Access

● Firewall○ Mainly port 27017

● Creating a dedicated network segment for Databases is recommended!

● DO NOT allow MongoDB to talk to the internet at all costs!!!

60

Page 59: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Security: System Access

● Recommended to restrict system access to Database Administrators● A “shell” on a system can be enough to take the system over!

61

Page 60: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Security: External Authentication

● LDAP Authentication○ Supported in PSMDB and MongoDB Enterprise

62

Page 61: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Security: SSL Connections and Auth

● SSL / TLS Connections● Intra-cluster authentication with x509

63

Page 62: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Security: SSL Connections and Auth

64

Page 63: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Security: Encryption at Rest

● MongoDB Enterprise● Percona Server for MongoDB

3.6.8-20 does have encryption at rest using keyfile in BETA

● Application-Level

65

Page 64: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

High-Availability

Page 65: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

High Availability - Replica Set

● Replication○ Asynchronous

■ Write Concerns can provide pseudo-synchronous replication■ Changelog based, using the “Oplog”

○ Maximum 50 members○ Maximum 7 voting members

■ Use “vote:0” for members $gt 7

67

Page 66: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

High Availability - Oplog

● Oplog○ The “oplog.rs” capped-collection in “local” storing changes to data○ Read by secondary members for replication○ Written to by local node after “apply” of operation○ Events in the oplog are idempotent

■ operations produce the same results whether applied once or multiple times to the target dataset

○ Each event in the oplog represent a single document inserted, updated, deleted○ Oplog has a default size depending on the OS and the storage engine

■ from 3.6 the size can be change at runtime using replSetResizeOplog admin command

68

Page 67: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

What is a Replica Set

● Group of mongod processes that maintain the same dataset● Provide redundancy and HA● Suggested for all production environments● Can provide increased read capacity

● clients can send read operations to different servers

● Automatic failover

● Internals are similar (more or less) to MySQL● async● events are replicated reading a primary node’s collection: oplog● Primary = Master● Secondary = Slave

69

Page 68: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Replica Set: how it works

70

Page 69: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Replica Set: how it works

71

Page 70: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Replica Set: automatic failover

72

Page 71: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Automatic failover

● When a primary does not communicate with the other members● electionTimeoutMillis period (10 seconds by default)

● The cluster attempts to complete the election of a new primary and

resume normal operations

● The RS cannot process write operations until the election completes

successfully

● The RS can continue to serve read queries if such queries are

configured to run on secondaries while the primary is offline

● An eligible secondary calls for an election to nominate itself as the new

primary

73

Page 72: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Architecture

● Datacenter Recommendations○ Minimum of 3 x physical servers required for High-Availability○ Ensure only 1 x member per Replica Set is on a single physical server!!!

● EC2 / Cloud Recommendations○ Place Replica Set members in odd number of Availability Zones, same region○ Use a hidden-secondary node for Backup and Disaster Recovery in another region○ Entire Availability Zones have been lost before!

74

Page 73: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Arbiter node

● A node with no data

● Usually used to have an odd number of

nodes

● Cannot be elected during failover

● Can vote during the election

75

Page 74: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Priority

● Priority: weight of each single node

● Defines which nodes can be elected as Primary

● Here is a typical architecture deployed on 3 data centers

76

Page 75: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Hidden/Delayed secondary nodes

● HIDDEN SECONDARY

○ Maintains a copy of the primary’s data

○ Invisible to client applications

○ Run Backups, Statistics or special tasks

○ Must be priority = 0 : cannot be elected as Primary but votes during

election

● DELAYED SECONDARY

○ Reflects earlier state of the dataset

○ Recover from unsuccessful application upgrades and operator

errors, Backups

○ Must be priority = 0 : cannot be elected as Primary but votes during

election

77

Page 76: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

More details in other sessions

MongoDB HA, what can go wrong?

Igor Donchovski

Wed 7th 12:20PM 1:10PM

@Bull

78

Page 77: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Quick Break and QA

15 minutes

Page 78: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Troubleshooting

“The problem with troubleshooting is trouble shoots back” ~ Unknown

Page 79: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Troubleshooting: db.currentOp()

● A function that dumps status info about running operations and various lock/execution details

● Only queries currently in progress are shown.● Provided Query ID (opid) number can be used to kill long running queries using

db.killOp()● Includes

○ Original Query○ Parsed Query○ Query Runtime○ Locking details

● Filter Documents○ { "$ownOps": true } == Only show operations for the current user○ https://docs.mongodb.com/manual/reference/method/db.currentOp/#examples

81

Page 80: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Troubleshooting: db.currentOp()

82

Page 81: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Troubleshooting: db.stats()

● Returns○ Document-data size (dataSize)○ Index-data size (indexSize)○ Real-storage size (storageSize)○ Average Object Size○ Number of Indexes○ Number of Objects○ https://docs.mongodb.com/manual/reference/method/db.stats/

83

Page 82: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Troubleshooting: db.stats()

84

Page 83: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Troubleshooting: Log File

● Interesting details are logged to the mongod/mongos log files○ Slow queries○ Storage engine details (sometimes)○ Index operations○ Sharding

■ Chunk moves○ Elections / Replication○ Authentication○ Network

■ Connections● Errors● Client / Inter-node connections

● The log could be really verbose○ verbosity can be controlled using db.setLogLevel()○ https://docs.mongodb.com/manual/reference/method/db.setLogLevel/

85

Page 84: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Troubleshooting: Log File - Slow Query

2018-09-19T20:58:03.896+0200 I COMMAND [conn175] command config.locks appName: "MongoDB Shell" command:

findAndModify { findAndModify: "locks", query: { ts: ObjectId('59c168239586572394ae37ba') }, update: {

$set: { state: 0 } }, writeConcern: { w: "majority", wtimeout: 15000 }, maxTimeMS: 30000 } planSummary:

IXSCAN { ts: 1 } update: { $set: { state: 0 } }

keysExamined:1

docsExamined:1

nMatched:1

nModified:1

keysInserted:1

keysDeleted:1

numYields:0

reslen:604

locks:{ Global: { acquireCount: { r: 2, w: 2 } }, Database: { acquireCount: { w: 2 } }, Collection: {

acquireCount: { w: 1 } }, Metadata: { acquireCount: { w: 1 } }, oplog: { acquireCount: { w: 1 } } }

protocol:op_command

106ms

86

Page 85: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Troubleshooting: Operation Profiler

● Writes slow database operations to a new MongoDB collection for analysis○ Capped Collection “system.profile” in each database, default 1MB○ The collection is capped, ie: profile data doesn’t last forever

● Support for operationProfiling data in PMM● Enable operationProfiling in “slowOp” mode

○ Start with a very high threshold and decrease it in steps○ Usually 50-100ms is a good threshold○ Enable in mongod.conf

operationProfiling:

slowOpThresholdMs: 100

mode: slowOp

87

Page 86: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Troubleshooting: Operation Profiler

● Useful Profile Metrics○ op/ns/query: type, namespace and query of a profile○ keysExamined: # of index keys examined○ docsExamined: # of docs examined to achieve result○ writeConflicts: # of Write Concern Exceptions

encountered during update○ numYields: # of times operation yielded for others○ locks: detailed lock statistics

88

Page 87: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Troubleshooting: .explain()

● Shows query explain plan for query cursors● This will include

○ Winning Plan■ Query stages

● Query stages may include sharding info in clusters

■ Index chosen by optimiser○ Rejected Plans

89

Page 88: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Troubleshooting: mlogfilter

● A useful tool for processing mongod.log files● A log-aware replacement for ‘grep’, ‘awk’ and friends● Generally focus on

○ mlogfilter --scan <file>■ Shows all collection scan queries

○ mlogfilter --slow <ms> <file>■ Shows all queries that are slower than X milliseconds

○ mlogfilter --op <op-type> <file>■ Shows all queries of the operation type X (eg: find, aggregate, etc)

● More on this tool here https://github.com/rueckstiess/mtools

91

Page 89: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Troubleshooting: mongostat

● Status of the current workload of a running mongod instance

● Useful after delivering a new application or for investigating ongoing unusual

behavior

● By default it provides metrics every 1 second○ number of inserted/updated/deleted/read documents○ percentage of WiredTiger cache in use/dirty○ number of flushes to disk○ inbound/outbound traffic

https://docs.mongodb.com/manual/reference/program/mongostat/

92

Page 90: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Troubleshooting: mongotop

● It tracks the time spent for reading/writing data

● Statistics on a per-collection level

● Useful to have a good idea of which are the collections that are taking more

time to execute reads and writes● By default it provides metrics every 1 second

https://docs.mongodb.com/manual/reference/program/mongotop/

93

Page 91: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Schema Design

“The problem with troubleshooting is trouble shoots back” ~ Unknown

Page 92: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Schema Design: Data Types

● Strings○ Only use strings if required○ Do not store numbers as strings!○ Look for {field:“123456”} instead of {field:123456}

■ “12345678” moved to a integer uses 25% less space■ Range queries on proper integers is more efficient

○ Example JavaScript to convert a field in an entire collection■ db.items.find().forEach(function(x) {

newItemId = parseInt(x.itemId);

db.containers.update(

{ _id: x._id },

{ $set: {itemId: itemId } }

)

});

○ Do not store dates as strings!■ The field "2017-08-17 10:00:04 CEST" stores in 52.5% less space!

○ Do not store booleans as strings!■ “true” -> true = 47% less space wasted

95

Page 93: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Schema Design: Indexes

● MongoDB supports BTree, text and geo indexes● Collection lock until indexing completes

○ index creation is a really heavy task

● Avoid drivers that auto-create indexes○ Use real performance data to make indexing decisions, find out before Production!

● Too many indexes hurts write performance for an entire collection○ Index entries must be maintained for any insert/update/delete

● Indexes have a forward or backward direction○ Try to cover .sort() with index and match direction!

96

Page 94: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

non-blocking Index creation

● db.collection.createIndex supports {background:true} option○ index creation doesn’t lock the collection○ the collection can be used by other queries○ index creation takes longer than foreground creation○ unpredictable performance !

● Using the Replica Set an index can be created using the following procedure○ detach a SECONDARY from the RS and create the index in foreground, the reconnect to

the RS○ repeat for all the SECONDARY nodes○ at last detach the PRIMARY

■ wait fo the election and detach the node when SECONDARY■ create the foreground index■ reconnect the node to the RS

97

Page 95: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Schema Design: Indexes

● Compound Indexes○ Several fields supported○ Fields can be in forward or backward direction

■ Consider any .sort() query options and match sort direction!○ Composite Keys are read Left -> Right

■ Index can be partially-read■ Left-most fields do not need to be duplicated!■ All Indexes below are duplicates:

● {username: 1, status: 1, date: 1, count: -1}● {username: 1, status: 1, data: 1}● {username: 1, status: 1 }● {username: 1 }

■ Duplicate indexes must be dropped

● Use db.collection.getIndexes() to view current Indexes

98

Page 96: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Schema Workflow

● Read Heavy Workflow○ Read-heavy apps benefit from pre-computed results○ Consider moving expensive reads computation to insert/update/delete○ Example 1: An app does ‘count’ queries often

■ Move .count() read query to a summary document with counters■ Increment/decrement single count value at write-time

○ Example 2: An app that does groupings of data■ Move .aggregate() read query that is in-line to the user to a backend summary worker■ Read from a summary collection, like a view

● Write Heavy Workflow○ Reduce indexing as much as possible○ Consider batching or a decentralised model with lazy updating (eg: social media graph)

101

Page 97: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Schema Workflow

● No list of fields specified in .find()○ MongoDB returns entire documents unless fields are specified○ Only return the fields required for an application operation!○ Covered-index operations require only the index fields to be specified

● Many $and or $or conditions○ MongoDB (or any RDBMS) doesn’t handle large lists of $and or $or efficiently○ Try to avoid this sort of model with

■ Data locality■ Background Summaries / Views

102

Page 98: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

More details in other sessions

MongoDB Sharding 101

Adamo Tonete

Tuesday 6th Nov 4:30PM 5:20PM

@Bull

124

Page 99: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Multi-document ACID Transactions

Page 100: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Multi-document ACID transactions

● New in 4.0● Writes on multiple documents in different collections can be included in a single

transaction● ACID properties are supported

○ atomicity○ consistency○ isolation○ durability

● Through snapshot isolation, transactions provide a consistent view of data, and enforce all-or-nothing execution to maintain data integrity

● Available for Replica Set and WiredTiger storage engine only○ in order to use transactions on a standalone server you need to start Replica Set○ transaction support for sharded cluster is scheduled for 4.2

128

Page 101: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Limitations

● A collection MUST exists in order to use transactions

● A collection cannot be created or dropped inside a transaction

● An index cannot be created or dropped inside a transaction

● Non-CRUD operations cannot be used inside a transaction; for example stuffs

like createUser, getParameter, etc.

● Cannot read/write in config, admin and local databases

● Cannot write to system.* collections

● Prior to use a transaction a session must be created

129

Page 102: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Is my app good for transactions?

● Yes it is if

○ you have a lot of 1:N and N:N relationships between different collections and you are

aware of data consistency

○ you manage commercial/financial and/or really sensitive data

○ you need to be aware of data consistency because your app needs to be

● In general remember the following matters

○ transactions incur a greater performance cost over single document writes

○ transactions should not be a replacement for effective schema design

■ embed documents as much as possible

■ denormalized data model continue to be optimal

■ single document writes are always atomic 130

Page 103: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

More details in other sessions

Use multi-document ACID transactions in MongoDB 4.0

Corrado Pandiani

Wed 7th 2:20PM 3:10PM

@Bull

What’s new in MongoDB 4.0

Vinicius Gripps

Tue 6th 11:20AM 12:10PM

@Bull

131

Page 104: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Benchmark and replay tools

Page 105: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

13

3

mongoreplay

● availability 3.4+

● captures traffic sent to a mongod instance

● replay later the captured traffic against a different mongod

● provides feedback on replayed traffic

● useful to test a new MongoDB deployment using the real workload

○ testing a different storage engine

○ testing a different hardware

○ testing a different OS configuration

Page 106: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

13

4

mongoreplay usage

● capture traffic with record command and create the playabck file

○ mongoreplay record -i eth0 -e "port 27017" -p

~/recordings/playback

● replay the recorded playback file using play command

○ mongoreplay play -p ~/recordings/playback --report

~/reports/replay_stats.json --host

mongodb://192.168.0.4:27018

● inspect a live mongod instance using monitor command

○ mongoreplay monitor -i eth0 -e 'port 27017' --report

~/reports/monitor-live.json --collect json

https://docs.mongodb.com/manual/reference/program/mongoreplay/

Page 107: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

13

5

flashback

● third party tool

● records queries from the profiler

○ setProfilingLevel set to 2

● replays captured ops

○ the replayer can sends these ops to databases as fast as possible to test limits

○ reply ops in accordance to their original timestamps, which allows us to imitate

regular traffic

https://github.com/facebookarchive/flashback

Page 108: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Monitoring

Page 109: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Monitoring: Methodology

● Monitor often○ 60 - 300 seconds is not enough!○ Problems can begin/end in seconds

● Correlate Database and Operating System together!● Monitor a lot

○ Store more than you graph○ Example: PMM gathers 700-900 metrics per polling

● Process○ Use to troubleshoot Production events / incidents○ Iterate and Improve monitoring

■ Add graphing for whatever made you SSH to a host■ Blind QA with someone unfamiliar with the problem

150

Page 110: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Monitoring: Important Metrics

● Database○ Operation counters○ Cache Traffic and Capacity○ Checkpoint ○ Concurrency Tickets (WiredTiger)○ Document and Index scanning

● Operating System○ CPU○ Disk○ Bandwidth / Util○ Average Wait Time○ Memory and Network

151

Page 111: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Monitoring: Percona PMM

● Open-source monitoring from Percona!● Based on open-source technology

○ Prometheus○ Grafana○ Go Language

● Simple deployment● Examples in this demo are

from PMM!● Correlation of OS and DB

Metrics● 800+ metrics per ping

152

Page 112: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Monitoring: Percona PMM

https://pmmdemo.percona.com

153

Page 113: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

More details in other sessions

Monitoring MongoDB with Percona Monitoring and Management (PMM)

Michael Coburn

Tue 5:25PM 5:50PM

Bull

154

Page 114: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Questions

Page 115: Deploying MongoDB in Production - Percona · Tuning MongoDB: WiredTiger WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes In-memory

Questions