5 pitfalls to avoid with mongodb

5 Pitfalls to Avoid with MongoDB

Tim CallaghanVP/Engineering,

[email protected]@tmcallaghan

Tokutek: Database Performance Engines

What is Tokutek?Tokutek® offers high performance and scalability for MySQL, MariaDB and MongoDB. Our easy-to-use open source solutions are compatible with your existing code and application infrastructure.

Tokutek Performance Engines Remove Limitations-Improve insertion performance by 20X-Reduce HDD and flash storage requirements up to 90%-No need to rewrite code

Tokutek Mission: Empower your database to handle the Big Data

requirements of today’s applications

3

A Global Customer Base

Housekeeping

• This presentation will be available for replay following the event

• We welcome your questions; please use the console on the right of your screen and we will answer following the presentation

• A copy of the presentation is available upon request

Agenda

• Describe use-cases that lead to well known pitfalls

• How can they be avoided?

• Test, Measure, and Analyze (benchmark)

6

Pitfalls - 1982

Pitfalls - 2013

What is TokuMX?

• TokuMX = MongoDB with improved storage

• Drop in replacement for MongoDB v2.4 applications• Including replication and sharding• Same data model• Same query language• Drivers just work• No Full Text or Geospatial

• Open Source– http://github.com/Tokutek/mongo

9

Pitfall 1 : Space

1a : Space

• MongoDB databases often grow quite large• it easily allows users to...• store large documents• keep them around for a long time

• de-normalized data needs more space• Operational challenges• Big disks are cheap, but not fast• Cloud storage is even slower• Fast disks (flash) are expensive• Backups are large as well

• Unfortunately, MongoDB does not offer compression

• goal = use less disk/flash

1a : Space : Avoidance

• TokuMX offers built-in compression• 3 compression algorithms• quicklz, zlib, lzma, (none)

• Everything is compressed• Field names and values• Secondary indexes too

• BitTorrent Peer Snapshot Data (~31 million documents)• 3 Indexes : peer_id + created, torrent_snapshot_id + created, created

{ id: 1, peer_id: 9222, torrent_snapshot_id: 4, upload_speed: 0.0000, download_speed: 0.0000, payload_upload_speed: 0.0000, payload_download_speed: 0.0000, total_upload: 0, total_download: 0, fail_count: 0, hashfail_count: 0, progress: 0.0000, created: "2008-10-28 01:57:35" }

http://cs.brown.edu/~pavlo/torrent/

12

1a : Space : Test

13

1a : Space : Analyze

size on disk, ~31 million inserts (lower is better)

14

1a : Space : Analyze

size on disk, ~31 million inserts (lower is better)

TokuMX achieved11.6:1

compression

1b : Space

• MongoDB stores field names in each document• Lots of redundant data• When field names are long, documents may contain more field name data than actual values

• Google “mongodb long field names”• Lots of blogs and advice

• ... but descriptive schemas are useful!

1b : Space : Avoidance

• Again, TokuMX offers built-in compression• Field names are compressed along with values

• Compression algorithms love redundant data

• Be descriptive and toss that data dictionary!• Who knows what is in field “zq”, not me?

1b : Space : Test

schema 1 - long field names (10/20/20){ first_name : “Tim”, last_name : “Callaghan”, email_address : “[email protected]” }

schema 2 - short field names (26 less bytes per doc){ fn : “Tim”, ln : “Callaghan”, ea : “[email protected]” }

1b : Space : Analyze

size on disk, 100 million inserts (lower is better)



TokuMX is substantially smaller, even

without compression



In TokuMX, field name length has almost no impact on size due to

compression

MongoDB was ~10% smaller

21

Pitfall 2 : Replication

2 : Replication

• MongoDB natively supports replication• High availability• Read scaling

• Shortcomings• lag, resource consumption on secondaries

• Recommended reading• http://blog.mongolab.com/2013/03/replication-lag-the-facts-of-life/

2 : Replication : Avoidance

• TokuMX replication allows secondary servers to process replication without IO• Simply injecting messages into the Fractal Tree Indexes on the secondary server

• The “Hard Work” was done on the primary•Read-before-write•Uniqueness checking

• Elimination of replication lag• Your secondaries are fully available for read scaling!

• Run multiple secondaries on a single server

23

2 : Replication : Test

• Sysbench• Workload

•point + range queries, update, delete, insert

•16 collections, 10mm rows, 16GB RAM• Setup

•loaded data on single server•shutdown and copied data folder•created secondary

• Ran benchmark

24

25

2 : Replication : Analyze

Note: TokuMX @ 32 TPS, MongoDB @ 12TPS

26

Pitfall 3 : Declining Performance

3 : Declining Performance

• MongoDB insert/update/delete performance drops dramatically when the indexes do not fit in memory

• Operations are limited by IOPs• Generally 1 operation per available IO• Less if secondary index maintenance, 1 IO for each

• Solution: Add RAM or Shard.

3 : Declining Performance : Avoidance

28

• TokuMX runs on Tokutek’s Fractal Tree indexes• Message buffers delay IO and reduce cache disruption• Perform many operations per IO

• Many workloads don’t need additional memory or sharding, they just need better indexing• RAM = $$$• Sharding = $$$ + Complexity

29

• indexed insertion workload (iibench)• http://github.com/tmcallaghan/iibench-mongodb

{ dateandtime: <date-time>,

cashregisterid: 1..1000,

customerid: 1..100000,

productid: 1..10000,

price: <double> }

• insert only, 1000 documents per insert, 100 million inserts• indexes

• price + customerid• cashregister + price + customerid• price + dateandtime + customerid

3 : Declining Performance : Test

• 100mm inserts into a collection with 3 secondary indexes

30

3 : Declining Performance : Analyze

31


• 100mm inserts into a collection with 3 secondary indexes

• Array Index Insertion (100 values per document)

32


33

Pitfall 4 : Concurrency

4 : Concurrency

• MongoDB originally implemented a global write lock• 1 writer at a time

• MongoDB v2.2 moved this lock to the database level• 1 writer at a time in each database

• This severely limits the write performance of servers

• 36 shards on 1 server example• Allows for more concurrency• High operational complexity• Google “mongodb multiple shards same serve

r”

• TokuMX performs locking at the document level• Extreme concurrency!

35

4 : Concurrency : Avoidance

instance

database database

collection

collection

collection

collection

document

document

document

document

document

document

document

document

document

document

MongoDB v2.2

MongoDB v2.0

TokuMX

• Sysbench read-write workload• point and range queries, update, delete, insert

• http://github.com/tmcallaghan/sysbench-mongodb

{ _id: 1..10000000, k: 1..10000000, c: <120 char random string ###-###-###>, pad: <60 char random string ###-###-###>}

36

4 : Concurrency : Test

37

4 : Concurrency : Analyze

38

4 : Concurrency : Analyze

39

Pitfall 5 : Transactions

5 : Got Transactions?

• MongoDB does not support “transactions”• Each operation is visible to everyone• There are work-arounds, Google “mongodb transactions”

• http://docs.mongodb.org/manual/tutorial/perform-two-phase-commits/This document provides a pattern for doing multi-

document updates or “transactions” using a two-phase commit approach for writing data to multiple documents. Additionally, you can extend this process to provide a rollback like functionality.

(the document is 8 web pages long)

• MongoDB does not support multi-version concurrency control (MVCC)

• Readers do not get a consistent view of the data, as they can be interrupted by writers

• People try, Google “mongodb mvcc”

• ACID• In MongoDB, multi-insertion operations allow for partial success• Asked to store 5 documents, 3 succeeded

• TokuMX offers “all or nothing” behavior• Document level locking

• MVCC• In MongoDB, queries can be interrupted by writers.• The effect of these writers are visible to the reader

• TokuMX offers MVCC• Reads are consistent as of the operation start

41

5 : Transactions : Avoidance

• Transactions in TokuMX• db.runCommand({“beginTransaction”})• ... perform 1 or more operations• db.runCommand(“rollbackTransaction”) | db.runCommand(“commitTransaction”)

• Note: not available in sharded environments

• For more information• http://www.tokutek.com/2013/04/mongodb-transactions-yes/• http://www.tokutek.com/2013/04/mongodb-multi-statement-

transactions-yes-we-can/

42

5 : Transactions : Avoidance

Tokutek: Database Performance Engines

43

Any Questions?Download TokuMX at www.tokutek.com/download

Register for product updates, access to premium content, and invitations at www.tokutek.com

Join the Conversation

5 pitfalls to avoid with mongodb

Technology