Transcript
Page 1: 5 Pitfalls to Avoid with MongoDB

5 Pitfalls to Avoid with MongoDB

Tim CallaghanVP/Engineering,

[email protected]@tmcallaghan

Page 2: 5 Pitfalls to Avoid with MongoDB

Tokutek: Database Performance Engines

What is Tokutek?Tokutek® offers high performance and scalability for MySQL, MariaDB and MongoDB. Our easy-to-use open source solutions are compatible with your existing code and application infrastructure.

Tokutek Performance Engines Remove Limitations-Improve insertion performance by 20X-Reduce HDD and flash storage requirements up to 90%-No need to rewrite code

Tokutek Mission: Empower your database to handle the Big Data

requirements of today’s applications

Page 3: 5 Pitfalls to Avoid with MongoDB

3

A Global Customer Base

Page 4: 5 Pitfalls to Avoid with MongoDB

Housekeeping

• This presentation will be available for replay following the event

• We welcome your questions; please use the console on the right of your screen and we will answer following the presentation

• A copy of the presentation is available upon request

Page 5: 5 Pitfalls to Avoid with MongoDB

Agenda

• Describe use-cases that lead to well known pitfalls

• How can they be avoided?

• Test, Measure, and Analyze (benchmark)

Page 6: 5 Pitfalls to Avoid with MongoDB

6

Pitfalls - 1982

Page 7: 5 Pitfalls to Avoid with MongoDB

Pitfalls - 2013

Page 8: 5 Pitfalls to Avoid with MongoDB

What is TokuMX?

• TokuMX = MongoDB with improved storage

• Drop in replacement for MongoDB v2.4 applications• Including replication and sharding• Same data model• Same query language• Drivers just work• No Full Text or Geospatial

• Open Source– http://github.com/Tokutek/mongo

Page 9: 5 Pitfalls to Avoid with MongoDB

9

Pitfall 1 : Space

Page 10: 5 Pitfalls to Avoid with MongoDB

1a : Space

• MongoDB databases often grow quite large• it easily allows users to...• store large documents• keep them around for a long time

• de-normalized data needs more space• Operational challenges• Big disks are cheap, but not fast• Cloud storage is even slower• Fast disks (flash) are expensive• Backups are large as well

• Unfortunately, MongoDB does not offer compression

• goal = use less disk/flash

Page 11: 5 Pitfalls to Avoid with MongoDB

1a : Space : Avoidance

• TokuMX offers built-in compression• 3 compression algorithms• quicklz, zlib, lzma, (none)

• Everything is compressed• Field names and values• Secondary indexes too

Page 12: 5 Pitfalls to Avoid with MongoDB

• BitTorrent Peer Snapshot Data (~31 million documents)• 3 Indexes : peer_id + created, torrent_snapshot_id + created, created

{ id: 1,  peer_id: 9222,  torrent_snapshot_id: 4,  upload_speed: 0.0000,  download_speed: 0.0000,  payload_upload_speed: 0.0000,  payload_download_speed: 0.0000,  total_upload: 0,  total_download: 0,  fail_count: 0,  hashfail_count: 0,  progress: 0.0000,  created: "2008-10-28 01:57:35" }

http://cs.brown.edu/~pavlo/torrent/

12

1a : Space : Test

Page 13: 5 Pitfalls to Avoid with MongoDB

13

1a : Space : Analyze

size on disk, ~31 million inserts (lower is better)

Page 14: 5 Pitfalls to Avoid with MongoDB

14

1a : Space : Analyze

size on disk, ~31 million inserts (lower is better)

TokuMX achieved11.6:1

compression

Page 15: 5 Pitfalls to Avoid with MongoDB

1b : Space

• MongoDB stores field names in each document• Lots of redundant data• When field names are long, documents may contain more field name data than actual values

• Google “mongodb long field names”• Lots of blogs and advice

• ... but descriptive schemas are useful!

Page 16: 5 Pitfalls to Avoid with MongoDB

1b : Space : Avoidance

• Again, TokuMX offers built-in compression• Field names are compressed along with values

• Compression algorithms love redundant data

• Be descriptive and toss that data dictionary!• Who knows what is in field “zq”, not me?

Page 17: 5 Pitfalls to Avoid with MongoDB

1b : Space : Test

schema 1 - long field names (10/20/20){ first_name : “Tim”, last_name : “Callaghan”, email_address : “[email protected]” }

schema 2 - short field names (26 less bytes per doc){ fn : “Tim”, ln : “Callaghan”, ea : “[email protected]” }

Page 18: 5 Pitfalls to Avoid with MongoDB

1b : Space : Analyze

size on disk, 100 million inserts (lower is better)

Page 19: 5 Pitfalls to Avoid with MongoDB

1b : Space : Analyze

size on disk, 100 million inserts (lower is better)

TokuMX is substantially smaller, even

without compression

Page 20: 5 Pitfalls to Avoid with MongoDB

1b : Space : Analyze

size on disk, 100 million inserts (lower is better)

In TokuMX, field name length has almost no impact on size due to

compression

MongoDB was ~10% smaller

Page 21: 5 Pitfalls to Avoid with MongoDB

21

Pitfall 2 : Replication

Page 22: 5 Pitfalls to Avoid with MongoDB

2 : Replication

• MongoDB natively supports replication• High availability• Read scaling

• Shortcomings• lag, resource consumption on secondaries

• Recommended reading• http://blog.mongolab.com/2013/03/replication-lag-the-facts-of-life/

Page 23: 5 Pitfalls to Avoid with MongoDB

2 : Replication : Avoidance

• TokuMX replication allows secondary servers to process replication without IO• Simply injecting messages into the Fractal Tree Indexes on the secondary server

• The “Hard Work” was done on the primary•Read-before-write•Uniqueness checking

• Elimination of replication lag• Your secondaries are fully available for read scaling!

• Run multiple secondaries on a single server

23

Page 24: 5 Pitfalls to Avoid with MongoDB

2 : Replication : Test

• Sysbench• Workload

•point + range queries, update, delete, insert

•16 collections, 10mm rows, 16GB RAM• Setup

•loaded data on single server•shutdown and copied data folder•created secondary

• Ran benchmark

24

Page 25: 5 Pitfalls to Avoid with MongoDB

25

2 : Replication : Analyze

Note: TokuMX @ 32 TPS, MongoDB @ 12TPS

Page 26: 5 Pitfalls to Avoid with MongoDB

26

Pitfall 3 : Declining Performance

Page 27: 5 Pitfalls to Avoid with MongoDB

3 : Declining Performance

• MongoDB insert/update/delete performance drops dramatically when the indexes do not fit in memory

• Operations are limited by IOPs• Generally 1 operation per available IO• Less if secondary index maintenance, 1 IO for each

• Solution: Add RAM or Shard.

Page 28: 5 Pitfalls to Avoid with MongoDB

3 : Declining Performance : Avoidance

28

• TokuMX runs on Tokutek’s Fractal Tree indexes• Message buffers delay IO and reduce cache disruption• Perform many operations per IO

• Many workloads don’t need additional memory or sharding, they just need better indexing• RAM = $$$• Sharding = $$$ + Complexity

Page 29: 5 Pitfalls to Avoid with MongoDB

29

• indexed insertion workload (iibench)• http://github.com/tmcallaghan/iibench-mongodb

{ dateandtime: <date-time>,

cashregisterid: 1..1000,

customerid: 1..100000,

productid: 1..10000,

price: <double> }

• insert only, 1000 documents per insert, 100 million inserts• indexes

• price + customerid• cashregister + price + customerid• price + dateandtime + customerid

3 : Declining Performance : Test

Page 30: 5 Pitfalls to Avoid with MongoDB

• 100mm inserts into a collection with 3 secondary indexes

30

3 : Declining Performance : Analyze

Page 31: 5 Pitfalls to Avoid with MongoDB

31

3 : Declining Performance : Analyze

• 100mm inserts into a collection with 3 secondary indexes

Page 32: 5 Pitfalls to Avoid with MongoDB

• Array Index Insertion (100 values per document)

32

3 : Declining Performance : Analyze

Page 33: 5 Pitfalls to Avoid with MongoDB

33

Pitfall 4 : Concurrency

Page 34: 5 Pitfalls to Avoid with MongoDB

4 : Concurrency

• MongoDB originally implemented a global write lock• 1 writer at a time

• MongoDB v2.2 moved this lock to the database level• 1 writer at a time in each database

• This severely limits the write performance of servers

• 36 shards on 1 server example• Allows for more concurrency• High operational complexity• Google “mongodb multiple shards same serve

r”

Page 35: 5 Pitfalls to Avoid with MongoDB

• TokuMX performs locking at the document level• Extreme concurrency!

35

4 : Concurrency : Avoidance

instance

database database

collection

collection

collection

collection

document

document

document

document

document

document

document

document

document

document

MongoDB v2.2

MongoDB v2.0

TokuMX

Page 36: 5 Pitfalls to Avoid with MongoDB

• Sysbench read-write workload• point and range queries, update, delete, insert

• http://github.com/tmcallaghan/sysbench-mongodb

{ _id: 1..10000000, k: 1..10000000, c: <120 char random string ###-###-###>, pad: <60 char random string ###-###-###>}

36

4 : Concurrency : Test

Page 37: 5 Pitfalls to Avoid with MongoDB

37

4 : Concurrency : Analyze

Page 38: 5 Pitfalls to Avoid with MongoDB

38

4 : Concurrency : Analyze

Page 39: 5 Pitfalls to Avoid with MongoDB

39

Pitfall 5 : Transactions

Page 40: 5 Pitfalls to Avoid with MongoDB

5 : Got Transactions?

• MongoDB does not support “transactions”• Each operation is visible to everyone• There are work-arounds, Google “mongodb transactions”

• http://docs.mongodb.org/manual/tutorial/perform-two-phase-commits/This document provides a pattern for doing multi-

document updates or “transactions” using a two-phase commit approach for writing data to multiple documents. Additionally, you can extend this process to provide a rollback like functionality.

(the document is 8 web pages long)

• MongoDB does not support multi-version concurrency control (MVCC)

• Readers do not get a consistent view of the data, as they can be interrupted by writers

• People try, Google “mongodb mvcc”

Page 41: 5 Pitfalls to Avoid with MongoDB

• ACID• In MongoDB, multi-insertion operations allow for partial success• Asked to store 5 documents, 3 succeeded

• TokuMX offers “all or nothing” behavior• Document level locking

• MVCC• In MongoDB, queries can be interrupted by writers.• The effect of these writers are visible to the reader

• TokuMX offers MVCC• Reads are consistent as of the operation start

41

5 : Transactions : Avoidance

Page 42: 5 Pitfalls to Avoid with MongoDB

• Transactions in TokuMX• db.runCommand({“beginTransaction”})• ... perform 1 or more operations• db.runCommand(“rollbackTransaction”) | db.runCommand(“commitTransaction”)

• Note: not available in sharded environments

• For more information• http://www.tokutek.com/2013/04/mongodb-transactions-yes/• http://www.tokutek.com/2013/04/mongodb-multi-statement-

transactions-yes-we-can/

42

5 : Transactions : Avoidance

Page 43: 5 Pitfalls to Avoid with MongoDB

Tokutek: Database Performance Engines

43

Any Questions?Download TokuMX at www.tokutek.com/download

Register for product updates, access to premium content, and invitations at www.tokutek.com

Join the Conversation


Top Related