mongodb meetup
Post on 06-May-2015
3.060 Views
Preview:
DESCRIPTION
TRANSCRIPT
Eytan Daniyalzade@daniyalzade
http://bit.ly/cb_mongodb_meetup
MongoDB & EC2: A Love Story?
Contents
● Chartbeat● Architecture● MongoDB & EC2 Challenges● Happy Ending: (MongoDB ? EC2)● Takeaways
chartbeat
Chartbeat: real-time analytics service
● 18 person startup in New York● part of Betaworks● peaking at just under 5M concurrents daily
○ up from 1M in July/2010
What chartbeat Provides
● real-time view of site performance
○ top pages
○ new/returning visitors
○ traffic flow■ where are people coming from■ where are people going to
● historic replay for the last 30 days
the architecture
Architecture, Browser
Part 1:<head><script type="text/javascript">var _sf_startpt=(new Date()).getTime()</script>...
Part 2:...function loadChartbeat() { // insert script tag}window.onload = loadChartbeat;</body>(highly simplified)
Ping is standard beacon logic, i.e. loading a 1x1 image.
Architecture, Backend
● custom libevent-based C backend○ real-time collection and aggregation
● real-time system in-memory only
● background queue jobs snapshot every x minutes○ Gearman
● historical data○ mostly in MongoDB
Why Chartbeat uses MongoDB
● Pure JSON all along○ Live API○ Historical data○ No mapping back and forth
● Fast Inserts (fire and forget)
● Flexible Schema
Why Chartbeat uses EC2
● Elastic Capacity
● No trips to datacenter
● EBS snapshots
Chartbeat & MongoDB & EC2 (1)
● 3 Clusters○ 1 for each product○ 1 as a caching layer○ 2 - 4 instance/cluster
● m2-2xlarge
○ 34.2 GB merory○ Ubuntu 10.04○ RAID0 x 4 - 1 TB volumes
● Dedicated Snapshot Server○ Shared among clusters○ Serves as an arbiter as well
Chartbeat & MongoDB & EC2 (2)Cluster View
MongoDB & EC2 Challenges
● Instances disappear○ MongoDB can have long recovery operations○ MongoDB is (was) not ACID compliant. Unclean
shutdown could corrupt your data.
● Poor IO performance on EBS○ MongoDB has global read/write lock
● Variable IO performance on EBS○ Could cause replication issues
Question:
�???
Disappearing Instances
Instances Disappearing - Master/Slave
● Down-time :(
● Slave-promotion = headache○ New instance○ Copy oplog○ Code change○ Long/manual/error prone
Instances Disappearing - Replica Sets
● No down-time :) yay!
● Automatic failover on writes
● Eventual failover on reads
● No code change
Instances Disappearing - Replica Sets(caveats)
● pymongo driver reads/writes from primary○ pymongo 2.1 will fix this
● chartbeat pymongo driver○ based on MasterSlaveConnection○ writes to primary○ distribute reads among secondaries○ automatic failover○ eventual read re-distribution
Instances Disappearing - Fact of Life● Accept this fact of life
● Always snapshot○ Dedicated snapshot server○ Hidden, i.e. no reads
● Automate everything○ puppet
■ New instance from scratch within a minute○ python-boto
■ Script all EC2 interaction■ new_instance.py■ mount_volumes_from_snap.py -o iid -n iid■ snapshot_mongo.py
Instances Disappearing - Caveats
● New volumes - slow!!!○ EBS loads blocks lazily
● Warm up EBS & File Cache before use○ Options
■ Slowly direct the reads (app by app)■ Run cache warm-up scripts
○ Not automated currently
Poor IO Performance on EBS
Poor IO Performance on EBS
● XFS & RAIDing Helps
but,
● Disk IO varies over time
● MongoDB holds global lock on writes
● Query of death○ Grinding-halt if not careful
Case Study: Historical Data● For historical data, we store time series.
{key:<key>ts:<key>values: {metric1: int1, metric2: int2}meta:{}}
● High Insert Rate vs Fast Historical Read○ Optimize reads or writes?
● Fast inserts: ~1 MB/sec (through append only)○ No disk-seek
● Historical reads: painfully slow
Faster Reads Through Cache DB● Avoid reading from disk● Favor reads over writes● Aim for disk & memory locality
{day_tskey:<key>values: {metric1: list(int), metric2: list(int)}}
● �Data for historical reads resides together
● .append() to list could cause disk fragmentation
Avoid Fragmentation w/ Preallocation● Fragmentation causes:
○ Inefficient disk usage○ Slower writes (due to block allocation)
● Preallocate daily arrays instead○ Pros:
■ No fragmentation■ Write causes no change in data size
○ Cons:■ Wasteful (we don't know keys ahead of time)■ Requires heavy disk IO, ~7MB/sec (~60Mbis/sec on EBS)
● Conclusion: spread preallocation over 1 hour
EC2 Performance is Unpredictable
EC2 Unpredictability - Challenges
● Resource contention in virtualized environment
● EBS and Network IO performance varies drastically
● RAID0 over 4 disks = 4 x risk
Heavy Monitoring (1)● Track individual disk performance over time
● Create a new instance if disk not getting better
Heavy Monitoring (2)
● Monitor replication lag
● Remove from read mix if lag gets too high○ Incorrect data○ Strain on primary
Heavy Monitoring (3)● Track slow queries / opcounts / track page faults / IO
volume○ Tweak indexes accordingly○ Limit requested data size if you can
Open Issues
● More granular page-fault / memory usage information○ Difficult due to mmap
● Multi-datacenter usage
● Burn-in scripts
● Sharding○ Tipping point will be insert volume○ Or inefficient read memory usage
● Better understand replication failures
Take-aways (1)
● Automate everything○ Instance creation, snapshotting, mount/unmount
● Strive for high locality & low fragmentation● Repeatedly revise schema/index● Heavily monitor
○ Server: IO/mem/disk○ MongoDB: Opcounts/Index Hits/Slow queries○ Cluster: Replication lag○ Application: CRUD times
Take-aways (2)
top related