growing up mongodb
DESCRIPTION
TRANSCRIPT
Growing up MongoDB
Kiril Savino - CTO GameChanger@kirilnyc
About Me.
Lead Engineer, Higher One
Lead Engineer, DoubleClick
Lead Engineer -> CTO, ShopWiki
Director Engineering, Conductor
Founder & CTO, GameChanger Media
10 years Oracle and MySQL, 4 MongoDB
pre-$not
About GameChanger.
Growing up.
865,443,426+
3 terabytes
16-nodes
240GB RAM, 8TB SSD storage
120,000 ops/s sustained
Learnings.
Schemas
Concurrency
Availability
Firefighting
1. Schemas!
Schema-less.
I do not think that word means what you think it means.
Be abnormal.
Schema-less does mean not having to separate data for modeling reasons
Focus on data usage patterns along with semantics
You're going to have to do this anyway: start now and scale up easy
Go monolithic.
Learn to query, then forget and pretend MongoDB is a really fancy, full-featured KV store.
Querying secondary data is a waste
Scans & indexed queries are slow
Memory fragmentation can kill you
Use MongoDB's strengths
Garbage in...
Validation is your problem
Don't let inconsistency linger
Know what parts of schema are flexible
Index wisely.
Data size
Insert/Update speed
Bad schema smell
2. Concurrency!
(A)CID.
Good schema design provides for basic atomicity at the document level
Obviates the need for transactions in many trivial cases
Your friends.
$set/$unset
$push/$pull
$addToSet
findAndModify
Two phased commits.
Optimistic locking.
External transactions.
Eventual Consistency.
Write canonical data first
Ensure queuing of propagation
Guarantee queue entry completeness
3. Availability!
“/dev/null is web scale”
Durability.
Journaling.
OK?
Replication.
Moar = better?
4. Firefighting!
Test your capacity.Naïve throughput testing with real hardware
Clone prod configuration
Consider copying data or subset
Start with crude approximations
Get as close to “real load” as makes sense
Model your growth.
db.stats()
db.<collection>.stats()
avg doc size, avg index size / doc
growth rates / collection
approx active portion / collection
Read your logs.
{...}ntoreturn:1keyUpdates:0numYields: 136locks(micros) r:368727reslen:78 199ms
Don’t scan.
So...
Schema-less != no schema
ACID overrated; concurrency not
High availability is up to you
Understand the mechanics
Thanks!
Kiril SavinoCTO, GameChanger Media
www.GameChanger.io@kirilnyc
kirilsavino.com/blog
Next Sessions at 3:405th Floor:
West Side Ballroom 3&4: Advanced Replication Internals
West Side Ballroom 1&2: Building a High-Performance Distributed Task Queue on MongoDB
Juilliard Complex: WhiteBoard Q&A
Lyceum Complex: Ask the Experts
7th Floor:
Empire Complex: Managing a Maturing MongoDB Ecosystem
SoHo Complex: MongoDB Indexing Constraints and Creative Schemas