5 Pitfalls to Avoid with MongoDB

Download 5 Pitfalls to Avoid with MongoDB

Post on 18-Dec-2014




1 download

Embed Size (px)


Learn how 5 of the most common MongoDB pitfalls can be avoided with Tokutek's TokuMX.


<ul><li> 1. 5 Pitfalls to Avoid with MongoDB Tim Callaghan VP/Engineering, Tokutek tim@tokutek.com @tmcallaghan </li> <li> 2. Tokutek: Database Performance Engines What is Tokutek? Tokutek offers high performance and scalability for MySQL, MariaDB and MongoDB. Our easy-to-use open source solutions are compatible with your existing code and application infrastructure. Tokutek Performance Engines Remove Limitations -Improve insertion performance by 20X -Reduce HDD and flash storage requirements up to 90% -No need to rewrite code Tokutek Mission: Empower your database to handle the Big Data requirements of todays applications </li> <li> 3. 3 A Global Customer Base </li> <li> 4. Housekeeping This presentation will be available for replay following the event We welcome your questions; please use the console on the right of your screen and we will answer following the presentation A copy of the presentation is available upon request </li> <li> 5. Agenda Describe use-cases that lead to well known pitfalls How can they be avoided? Test, Measure, and Analyze (benchmark) </li> <li> 6. 6 Pitfalls - 1982 </li> <li> 7. Pitfalls - 2013 </li> <li> 8. What is TokuMX? TokuMX = MongoDB with improved storage Drop in replacement for MongoDB v2.4 applications Including replication and sharding Same data model Same query language Drivers just work No Full Text or Geospatial Open Source http://github.com/Tokutek/mongo </li> <li> 9. 9 Pitfall 1 : Space </li> <li> 10. 1a : Space MongoDB databases often grow quite large it easily allows users to... store large documents keep them around for a long time de-normalized data needs more space Operational challenges Big disks are cheap, but not fast Cloud storage is even slower Fast disks (flash) are expensive Backups are large as well Unfortunately, MongoDB does not offer compression goal = use less disk/flash </li> <li> 11. 1a : Space : Avoidance TokuMX offers built-in compression 3 compression algorithms quicklz, zlib, lzma, (none) Everything is compressed Field names and values Secondary indexes too </li> <li> 12. BitTorrent Peer Snapshot Data (~31 million documents) 3 Indexes : peer_id + created, torrent_snapshot_id + created, created { id: 1, peer_id: 9222, torrent_snapshot_id: 4, upload_speed: 0.0000, download_speed: 0.0000, payload_upload_speed: 0.0000, payload_download_speed: 0.0000, total_upload: 0, total_download: 0, fail_count: 0, hashfail_count: 0, progress: 0.0000, created: "2008-10-28 01:57:35" } http://cs.brown.edu/~pavlo/torrent/ 12 1a : Space : Test </li> <li> 13. 13 1a : Space : Analyze size on disk, ~31 million inserts (lower is better) </li> <li> 14. 14 1a : Space : Analyze size on disk, ~31 million inserts (lower is better) TokuMX achieved 11.6:1 compression </li> <li> 15. 1b : Space MongoDB stores field names in each document Lots of redundant data When field names are long, documents may contain more field name data than actual values Google mongodb long field names Lots of blogs and advice ... but descriptive schemas are useful! </li> <li> 16. 1b : Space : Avoidance Again, TokuMX offers built-in compression Field names are compressed along with values Compression algorithms love redundant data Be descriptive and toss that data dictionary! Who knows what is in field zq, not me? </li> <li> 17. 1b : Space : Test schema 1 - long field names (10/20/20) { first_name : Tim, last_name : Callaghan, email_address : tim@tokutek.com } schema 2 - short field names (26 less bytes per doc) { fn : Tim, ln : Callaghan, ea : tim@tokutek.com } </li> <li> 18. 1b : Space : Analyze size on disk, 100 million inserts (lower is better) </li> <li> 19. 1b : Space : Analyze size on disk, 100 million inserts (lower is better) TokuMX is substantially smaller, even without compression </li> <li> 20. 1b : Space : Analyze size on disk, 100 million inserts (lower is better) In TokuMX, field name length has almost no impact on size due to compression MongoDB was ~10% smaller </li> <li> 21. 21 Pitfall 2 : Replication </li> <li> 22. 2 : Replication MongoDB natively supports replication High availability Read scaling Shortcomings lag, resource consumption on secondaries Recommended reading http://blog.mongolab.com/2013/03/replication- lag-the-facts-of-life/ </li> <li> 23. 2 : Replication : Avoidance TokuMX replication allows secondary servers to process replication without IO Simply injecting messages into the Fractal Tree Indexes on the secondary server The Hard Work was done on the primary Read-before-write Uniqueness checking Elimination of replication lag Your secondaries are fully available for read scaling! Run multiple secondaries on a single server 23 </li> <li> 24. 2 : Replication : Test Sysbench Workload point + range queries, update, delete, insert 16 collections, 10mm rows, 16GB RAM Setup loaded data on single server shutdown and copied data folder created secondary Ran benchmark 24 </li> <li> 25. 25 2 : Replication : Analyze Note: TokuMX @ 32 TPS, MongoDB @ 12TPS </li> <li> 26. 26 Pitfall 3 : Declining Performance </li> <li> 27. 3 : Declining Performance MongoDB insert/update/delete performance drops dramatically when the indexes do not fit in memory Operations are limited by IOPs Generally 1 operation per available IO Less if secondary index maintenance, 1 IO for each Solution: Add RAM or Shard. </li> <li> 28. 3 : Declining Performance : Avoidance 28 TokuMX runs on Tokuteks Fractal Tree indexes Message buffers delay IO and reduce cache disruption Perform many operations per IO Many workloads dont need additional memory or sharding, they just need better indexing RAM = $$$ Sharding = $$$ + Complexity </li> <li> 29. 29 indexed insertion workload (iibench) http://github.com/tmcallaghan/iibench-mongodb { dateandtime: , cashregisterid: 1..1000, customerid: 1..100000, productid: 1..10000, price: } insert only, 1000 documents per insert, 100 million inserts indexes price + customerid cashregister + price + customerid price + dateandtime + customerid 3 : Declining Performance : Test </li> <li> 30. 100mm inserts into a collection with 3 secondary indexes 30 3 : Declining Performance : Analyze </li> <li> 31. 31 3 : Declining Performance : Analyze 100mm inserts into a collection with 3 secondary indexes </li> <li> 32. Array Index Insertion (100 values per document) 32 3 : Declining Performance : Analyze </li> <li> 33. 33 Pitfall 4 : Concurrency </li> <li> 34. 4 : Concurrency MongoDB originally implemented a global write lock 1 writer at a time MongoDB v2.2 moved this lock to the database level 1 writer at a time in each database This severely limits the write performance of servers 36 shards on 1 server example Allows for more concurrency High operational complexity Google mongodb multiple shards same server </li> <li> 35. TokuMX performs locking at the document level Extreme concurrency! 35 4 : Concurrency : Avoidance instance database database collection collection collection collection document document document document document document document document document document MongoDB v2.2 MongoDB v2.0 TokuM X </li> <li> 36. Sysbench read-write workload point and range queries, update, delete, insert http://github.com/tmcallaghan/sysbench-mongodb { _id: 1..10000000, k: 1..10000000, c: , pad: } 36 4 : Concurrency : Test </li> <li> 37. 37 4 : Concurrency : Analyze </li> <li> 38. 38 4 : Concurrency : Analyze </li> <li> 39. 39 Pitfall 5 : Transactions </li> <li> 40. 5 : Got Transactions? MongoDB does not support transactions Each operation is visible to everyone There are work-arounds, Google mongodb transactions http://docs.mongodb.org/manual/tutorial/perform-two- phase-commits/ This document provides a pattern for doing multi-document updates or transactions using a two-phase commit approach for writing data to multiple documents. Additionally, you can extend this process to provide a rollback like functionality. (the document is 8 web pages long) MongoDB does not support multi-version concurrency control (MVCC) Readers do not get a consistent view of the data, as they can be interrupted by writers People try, Google mongodb mvcc </li> <li> 41. ACID In MongoDB, multi-insertion operations allow for partial success Asked to store 5 documents, 3 succeeded TokuMX offers all or nothing behavior Document level locking MVCC In MongoDB, queries can be interrupted by writers. The effect of these writers are visible to the reader TokuMX offers MVCC Reads are consistent as of the operation start 41 5 : Transactions : Avoidance </li> <li> 42. Transactions in TokuMX db.runCommand({beginTransaction}) ... perform 1 or more operations db.runCommand(rollbackTransaction) | db.runCommand(commitTransaction) Note: not available in sharded environments For more information http://www.tokutek.com/2013/04/mongodb-transactions-yes/ http://www.tokutek.com/2013/04/mongodb-multi-statement- transactions-yes-we-can/ 42 5 : Transactions : Avoidance </li> <li> 43. Tokutek: Database Performance Engines 43 Any Questions? Download TokuMX at www.tokutek.com/download Register for product updates, access to premium content, and invitations at www.tokutek.com Join the Conversation </li> </ul>