sf mongodb user group : using mongodb for ign's social platform

Using MongoDB for IGN’s Social Platform

SF Bay Area MongoDB User GroupTuesday Feb 15th, 2011

About Me

Manish Pandit

@lobster1234http:/about.me/mpandit

About IGN’s Social Platform

• An API to connect gamer community with editors, games, other gamers, and help lay the foundation for premium content discovery as well as UGC

• In beta since Sept 2010• 5M+ activities • 20K UVs a day, ~100K PVs a day

Architecture

• REST based API, built in Java• Entities are People, MediaItems, Activities,

Comments, Notifications, Status• Interfaces across IGN.com as well as other

social networks• Caching tier based on memcached• MySQL and MongoDB as persistence• PHP/Zend front end

MongoDB Usage

• Activity Streams : ActivityStrea.ms standard• Activity Caching : (more on this later!)• Activity Commenting• Points : Also extend to badges• Block lists, Ban lists• Notifications : System notifications• Analytics : Activity snapshot for a user

Alternatives

• MySQL – Obvious alternative, being used for storing person

data, game data, relationships– Did not work for activities

– Massive joins to filter newsfeeds, i.e. activities from friends– Fairly normalized schema for activities– Too many changes to the schema as requirements changed

and new types of activities came into picture. Alter table started to take hours.

– Optimization led to large number of indexes, slowing down the writes

Alternatives

• Voldemort– Used for the initial release, Sept 2010

• Fast and simple implementation of Amazon Dynamo

– Did not work out for long• We needed the ability to query the data• Needed more than Key-Value pairs• No in-place updates out of the box, had to write custom

code to handle concurrent update conflicts (read-repair).• Not a lot of developer velocity when compared to MongoDB

Other alternatives

• Cassandra• Learning curve, lack of querying• Did not want to bite more than we could chew

• CouchDB• Map-reduce queries, views• REST-based API is good, but performance gets affected

by a chatty, HTTP interface for a database

Configuration• Server:

• 1 Master, 2 Slaves (load balanced thru Netscalar)• 2 extra slaves which are not queried (replicate!!)• Version 1.6.1

• Client:• Java Driver (2.1)• Ruby Driver (1.2)

• Mappers:• Morphia for Java

• Connections per host : 200, #hosts = 4• Oplog Size: 1GB, about 2.5 hours• Syncdelay: 60s (default)• Hardware: 2 core, 6 GB virtualized machine

Maintenance

• Data defragmentation• Slaves – by running it on different port• Master – by having a downtime

• Collection trimming• The scripts block during remove• Bulk removes kills the slaves, spiking CPU 100%

Monitoring

• Nagios• TCP Port Monitoring • Disk space monitoring• CPU monitoring

• Munin• Mongo connections • Memory usage• Ops/second• Write Lock %• Collection Sizes (in terms of # of documents)

Backup or prepping for O Shit!

• NetApp Filter based, snapshots• Make sure to do {fsync:1} and {lock:1} on one slave

• Hourly dumps via cron job• Using mongodump

• Incremental backup via the oplog• Replay the oplog instead of relying on a snapshot

• Delayed slaves • Not recommended as it almost guarantees data loss

proportional to the delay, which is inversely proportional to the time-to-react

Tools to be familiar with• mongostat

• Look at queue lengths, memory, connections and operation mix

• db.serverStatus()• Server status with sync, pagefaults, locks, index misses

• atop• iostat• db.stats()

• Overall info at the database level

• db.<coll_name>.stats()• Overall info at the collection level

• db.printReplicationInfo()• Info about the oplog size and time

• db.printSlaveReplicationInfo()• Info about the master, the last sync timetamp, and how behind the slave is from the

master

Challenges with ActivityStreams• Lots of data!

• Large amount of data coming out as a result

• Reverse sorting• The data has to be sorted in reverse natural order ($natural : -1), and we do not use

capped collections

• Aggregation of similar activities• Impacts pagination

• Fetching self activities (profile), and newsfeed (self + others)• Filtering based on the activity type

• People want to see Game Updates or Blog updates from their friends

• Hydration of activities for dynamic data• The thumbnail and level of the actor may change

• Comments • When an activity is rendered, the initial comments and count has to be pulled ($slice)

TODO: Rant about missing $size operator

ActivityStreamsEach activity has an ACTOREach actor has a TYPEEach actor performs an action, that action is called a VERB Each VERB can act upon many Objects, called ACTIVITYOBJECTSSome VERBs may involve a Target, called ACTIVITYTARGETEvery entity (Actor, ActivityObject, ActivityTarget) has links to define it

Examples :

A writes ‘Hello!’ on B’s wallActor => A, ActivityObject => ‘Hello!’ of type WALL_POST, ActivityTarget => B, VERB => POST

A follows a game BActor => A, ActivityObject => B of type MEDIA_ITEM, ActivityTarget => null, VERB => FOLLOW

………and it gets complicated as we go down the rabbit hole!

Caching using MongoDB

• Caching the entire streams• A bad idea (or bad implementation?)• The expired objects sat in the db, bloating the database• The removal did not free up space, so we ran out

• Use Mongo as a cache-key-index• Cache the streams in Memcached• For invalidation, keep the index of the memcached keys

in MongoDB.• Works!

What we’ve learned

• Keep an eye on• Page Faults• Index misses• Queue lengths• Database sizes on disk due to reuse vs. release

• Use .explain() • Watch for nscanned and indexBounds

• Use limit() when using find• While updating, try to load that object in memory so that its

in the working set (findAndModify)• Try to keep the fields being selected at a minimum• Replicate and denormalize instead of using writeconcerns

Near term Plans

• Move to replica sets • Move relationship graphs to MongoDB• Shard the relationships based on the userId• Run multiple mongo processes, splitting out

collections among multiple databases

Wishlist

• Respect indexes in $or queries• A $size operator for arrays• $inc when doing $addToSet• Defragmentation when removing data• Concurrency – too many write lock conditions• A decent start/stop script• Load balancing in the driver (round robin) for

reads

We are hiring

• Software Engineers to help us with exciting initiatives at IGN

• Technologies we use• RoR, Java (no J2EE!), Spring, PHP/Zend, JQuery• HTML5, CSS3, Sencha Touch, PhoneGap• MongoDB, memcached, Solr

http://corp.ign.com

http://corp.ign.com/

Questions

References

• IGN’s Social Platform• http://my.ign.com• http://people.ign.com/ign-labs

• Mongo Munin Plugins• https://github.com/erh/mongo-munin• https://github.com/lobster1234/munin-mongo-

collections

• Morphia• http://code.google.com/p/morphia/

http://my.ign.com/

http://people.ign.com/ign-labs

https://github.com/erh/mongo-munin

https://github.com/erh/mongo-munin

https://github.com/lobster1234/munin-mongo-collections

https://github.com/lobster1234/munin-mongo-collections

http://code.google.com/p/morphia/

http://code.google.com/p/morphia/

sf mongodb user group : using mongodb for ign's social platform

Technology

activityobject

activitytarget

verb gt

actor

verb

mongodb

info

activities