mongodb backups - percona current state of... · 8 today’s tools - online hot backup percona...

23
MongoDB Backups The current state of the ecosystem

Upload: vuongnhan

Post on 24-Aug-2019

225 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: MongoDB Backups - Percona Current State of... · 8 Today’s Tools - Online Hot Backup Percona Server for MongoDB has added online hot backups Native command in mongo, no need for

MongoDB Backups

The current state of the ecosystem

Page 2: MongoDB Backups - Percona Current State of... · 8 Today’s Tools - Online Hot Backup Percona Server for MongoDB has added online hot backups Native command in mongo, no need for

2

Agenda● Who am I? ● Today’s typical backup types:

○ Logical○ Binary○ Snapshot (iSCSI/LVM)○ Ops Manager / Atlas Backups

● Complications when it comes to sharding● How to get consistent sharded backups in v3.2+ ● Percona Labs’ Mongo consistent backup tool

Page 3: MongoDB Backups - Percona Current State of... · 8 Today’s Tools - Online Hot Backup Percona Server for MongoDB has added online hot backups Native command in mongo, no need for

3

Who am I?

David MurphyPast key roles

● Electronic Arts ○ NoSQL / MySQl Architect

● ObjectRocket / Rackspace○ MongoDB Lead / Architect

MySQL since 3.22 (yes, the 90s)Mongo Master Alumni and ContributorUsing MongoDB since 1.6

Page 4: MongoDB Backups - Percona Current State of... · 8 Today’s Tools - Online Hot Backup Percona Server for MongoDB has added online hot backups Native command in mongo, no need for

Today’s Backup TypesLooking at today's single node or replica set backups, and the good and the bad in each

Page 5: MongoDB Backups - Percona Current State of... · 8 Today’s Tools - Online Hot Backup Percona Server for MongoDB has added online hot backups Native command in mongo, no need for

5

Today’s Tools - Logical BackupsAlmost always use mongodump, which has some particular considerations:

❏ You must determine which secondary you want to talk to❏ The “H” option points to single host or replica set ( using secondary reads)

❏ But it might choose a node you don’t intend it to❏ Does not protect against lagging secondaries

❏ Single node can not be consistently backed up!❏ Because MongoDB uses read-uncommitted without an oplog, backups are

not safe.

❏ Restores take a huge amount of time but spaced used is tiny

Page 6: MongoDB Backups - Percona Current State of... · 8 Today’s Tools - Online Hot Backup Percona Server for MongoDB has added online hot backups Native command in mongo, no need for

6

Today’s Tools - Snapshot (LVM)Assuming you are using LVM, there are some considerations - however backups will always be 100% of the data size and will restore quickly with no need to “re-hydrate”

Snapshot can be made instantly:❖ You must choose which node to take a backup on (usually people make a hidden

node)❖ Restores are fast and consistent❖ Will only take the time to copy the files back into place❖ Must use 100% of the normal space, compression slows restoring

➢ Needs to have spare space in the VG for a snapshot volume➢ Snapshot COW table will grow until it runs the VG out of space, and

then the snapshot will stop➢ Serious performance issues will occur while snapshot is active➢ You will want to delete the snapshot ASAP, after RSYNC the contents

somewhere else

Page 7: MongoDB Backups - Percona Current State of... · 8 Today’s Tools - Online Hot Backup Percona Server for MongoDB has added online hot backups Native command in mongo, no need for

7

Today’s Tools - Snapshot (ISCSI/NFS/EBS)Everything from LVM, on instant snapshots, and fast/consistent restores apply, however:

■ COW table might cheaper

■ Deduplication on most SANs

■ Incremental snapshots on most SANs

■ MMAPv1 performance is non-viable

■ Other engine are possible but not advised

Page 8: MongoDB Backups - Percona Current State of... · 8 Today’s Tools - Online Hot Backup Percona Server for MongoDB has added online hot backups Native command in mongo, no need for

8

Today’s Tools - Online Hot BackupPercona Server for MongoDB has added online hot backups

● Native command in mongo, no need for a third party tool● Wired Tiger and RocksDB both supported

○ In Memory is not supported due to lack of internal data storage ○ MMAPv1 is unable to support this due to it’s offloading to the OS

● They pick a point in the internal trx logs and start copying data● Copies to a new folder as binary files

Restoring these backups is a simple matter of starting a mongod process pointing to the new folder as the DB path.

Not shard aware, still needs something more for cluster wide backups.

Page 9: MongoDB Backups - Percona Current State of... · 8 Today’s Tools - Online Hot Backup Percona Server for MongoDB has added online hot backups Native command in mongo, no need for

9

Today’s Tools- MongoDB Ops Manager / AtlasUsing the this tool, you are choosing NOT to be open source, and locked into a vendor!▪ Initial backups

• Sends docs to MOM server in 10MB chunks, then sends all oplog changes• Builds Copy DB + Applies Oplogs• Marks this as the 1st backup done

▪ Oplog streaming

• Able to now just stream and apply any oplogs to a backup like replication does

▪ Snapshots

• At regular input points the current version of the DB copy is cloned• New oplogs are applies to only 1 side• Gives you snapshots you can return to that are maybe daily or hourly

Page 10: MongoDB Backups - Percona Current State of... · 8 Today’s Tools - Online Hot Backup Percona Server for MongoDB has added online hot backups Native command in mongo, no need for

Complications with ShardingHow do we backup when we are using shards? How do we time things well?

Page 11: MongoDB Backups - Percona Current State of... · 8 Today’s Tools - Online Hot Backup Percona Server for MongoDB has added online hot backups Native command in mongo, no need for

11

• Logical backups

- No Native support for consistent sharded backups

• Binary/Snapshot backups

- No Native support for consistent sharded backups

• MongoDB Ops Manager

- Only snapshot support, no true Point-In-Time Recovery (PITR) support

Sharding Complications: Sharding Support

Page 12: MongoDB Backups - Percona Current State of... · 8 Today’s Tools - Online Hot Backup Percona Server for MongoDB has added online hot backups Native command in mongo, no need for

12

Different shards will finish different backups types at different times

• Logical backups

- Each shard will be of a different size, backups will finish at different times- MongoDB-based dumps will not use --oplog and therefore won’t be consistent at each shard- As different dumps finish at different times, three questions come up:

• Is the Balancer off? • Are there any migrations running? • What about new DB’s and manual moves?

• Binary/Snapshot backups

- These worked great in a single replica, but how do I make them all run at the same time?

How do I make sure the above questions are answered?

Sharding Complications: Consistency

Page 13: MongoDB Backups - Percona Current State of... · 8 Today’s Tools - Online Hot Backup Percona Server for MongoDB has added online hot backups Native command in mongo, no need for

Consistent Sharded BackupsThe new design of config servers being a replica has solved a very complicated backup issue:

point in time recovery (PIT) of a sharded cluster

Page 14: MongoDB Backups - Percona Current State of... · 8 Today’s Tools - Online Hot Backup Percona Server for MongoDB has added online hot backups Native command in mongo, no need for

14

3.2 was a HUGE leap forward for operation groups backing up MongoDB. Having the config servers be a replica set allows all parts of the system to be handled as one:

• If someone was able to run a snapshot at the same time on all shards and a config server then this isn’t an issue. However micro time variations could result in missing a change and therefore failing recovery tests.

• There was no good way to understand how to update each shard to Backup + 1 hour, and then update all the config metadata. Now we can say restore everything to Backup + 1 hour and we know it’s safe and exactly what the system was at the time.

• Some more tooling is still need to constantly capture the oplog for that case, but it’s least possible to do now.

What Does 3.2 Help Fix?

Page 15: MongoDB Backups - Percona Current State of... · 8 Today’s Tools - Online Hot Backup Percona Server for MongoDB has added online hot backups Native command in mongo, no need for

Percona Labs’s Backup ToolWhat if there was a tool that let you point to a replica set or cluster, and it would worry about the

backing up of shards, aligning the recovery point, and compressing them into a central logical place?

Page 16: MongoDB Backups - Percona Current State of... · 8 Today’s Tools - Online Hot Backup Percona Server for MongoDB has added online hot backups Native command in mongo, no need for

16

● Python single binary file tool that only needs Python 2.7● Intelligent enough to detect mongos, become self recursive and backup

all your shards or a replica set automatically● Single Mongod’s won’t work with this tool

○ mongodump can’t consistently back them up - no oplog to be consistent!

● Ensures all shard’s dump times are consistent with each other ○ Opens up oplog tailers to all shards until the last dump finishes

● After that is done:○ If 3.2+ - It has also been dumping/tailing the config servers so everything is

consistent :)○ If 3.0 or before - Fsync locks a config server and dumps it at the last moment with

the balancer off for the whole backup.

Mongo Consistent Backup

Page 17: MongoDB Backups - Percona Current State of... · 8 Today’s Tools - Online Hot Backup Percona Server for MongoDB has added online hot backups Native command in mongo, no need for

17

Page 18: MongoDB Backups - Percona Current State of... · 8 Today’s Tools - Online Hot Backup Percona Server for MongoDB has added online hot backups Native command in mongo, no need for

18

Page 19: MongoDB Backups - Percona Current State of... · 8 Today’s Tools - Online Hot Backup Percona Server for MongoDB has added online hot backups Native command in mongo, no need for

19

The vision:

• Remain 100% open source and free to the community • Community involvement in what features you need, contributing improvements, and reporting bugs• Oplog recorder Daemon

- constantly getting oplogs for each shard and storing them as one for central record > oplog size

- Allows incremental backups, granular to the second recovery• while letting you control the retention based on your budget

• Uploading to additional cloud locations like Google Cloud Storage, Azure ZRS, Rackspace Cloud Files• Restore tooling to make more automated restores when sharded• Encryption support • Ability to filter some collections/databases out of the backups and restores• Offline backup querying (RocksDB)

Where Is It Going?

Page 20: MongoDB Backups - Percona Current State of... · 8 Today’s Tools - Online Hot Backup Percona Server for MongoDB has added online hot backups Native command in mongo, no need for

20

Version 1.0 Released soon, see:

https://github.com/Percona-Lab/mongodb_consistent_backup/tree/MCB_1.0 New Features

• Logging to file w/compression of previous log- Logging of backup output (eg: mongodump stderr output, etc)- Password obfuscation in log output

• Nested YAML config file (--config=<file> flag)• ZBackup Archive Method- Significant disk usage reduction with some additional CPU%- Block-level de-duplication with LZMA compression- AES-128 CBC encryption at rest (optional)- Upload phase does not yet support ZBackup (coming soon)

Improvements• Improved thread and race condition safety- Stronger validation of thread state (oplog position, success, etc)

Where is it going?

Page 21: MongoDB Backups - Percona Current State of... · 8 Today’s Tools - Online Hot Backup Percona Server for MongoDB has added online hot backups Native command in mongo, no need for

21

Improvements

• Strong write concern for balancer lock update (3.0 and older only)• Better failure resistance with support for multiple seed hosts (--host flag or ‘host’ config var)• Faster RPM build (eg: make rpm)• Updated dependencies

- pymongo upgrade adds support new data types in MongoDB 3.4• Experimental

- Dockerfile for running mongodb_consistent_backup in Docker (eg: kubernetes/mesos, EC2 Container Service, etc) - must set up persistent Docker volumes yourself!

• Many other smaller bugfixes and optimizations

Mongo Consistent Backup 1.0 (cont’d)

Page 22: MongoDB Backups - Percona Current State of... · 8 Today’s Tools - Online Hot Backup Percona Server for MongoDB has added online hot backups Native command in mongo, no need for

22

https://github.com/Percona-Lab/mongodb_consistent_backup GPL license Encourage community participation Very actively developed for use in our services All issues go to myself and the escalations team for MongoDB @ Percona

Where do I find it?

Page 23: MongoDB Backups - Percona Current State of... · 8 Today’s Tools - Online Hot Backup Percona Server for MongoDB has added online hot backups Native command in mongo, no need for

23

Rate My Session