mongodb to cassandra

MongoDB to CassandraThe Atlas Odyssey

Fred van den DriesscheEngineer@fredvdd

Tom McAdamCTO

@tfm

Adam HorwichSystems Engineer@Mmmkayness

http://flickr.com/photos/dhammza/88644497/

tbc

Video and audio metadata from 20+ sources

Profiles and activity from video and audio products, social networks

Our platform - late 2012

tbc

MetaBroadcast platform

Analytic requests and groupings

Main clients Main Partners

Data Partners

What is Atlas?

ATLAS

DB

BBC

PA

C4

etc...

/content

/schedules

/topics

sitemaps

radioplayer

interlinking

Atlas Data Model

brand item

series version

broadcast location

MongoDB

• flexible

• features

• really simple

• shell

Where MongoDB falls short

• too simple

• lack of control

• sharding

• embedding

Where to?

Where to?

• add a cache?

Atlas API• content

• http://atlas.metabroadcast.com/3.0/content.json?uri=http://www.bbc.co.uk/programmes/b0074g7p&annotations=description,brand_summary,locations&apiKey=6ed2a984627daff816198acde82

• http://atlas.metabroadcast.com/3.0/content.json?apiKey=aaaa&uri=http://www.bbc.co.uk/programmes/b0074g7p&annotations=description,brand_summary,locations

• schedules

• http://atlas.metabroadcast.com/3.0/schedule.json?from=now&to=now.plus.3h&channel=bbcone&publisher=bbc.co.uk

• http://atlas.metabroadcast.com/3.0/schedule.json?from=1948-12-24&to=1948-12-25&channel=radio4&publisher=bbc.co.uk

• api explorer http://atlas.metabroadcast.com/#apiExplorer

http://atlas.metabroadcast.com/#apiExplorer

http://atlas.metabroadcast.com/#apiExplorer

Why Cassandra?

• scalability/performance

• row caches

• consistency control

• column-based model matches our use case

And?

• ElasticSearch

• messaging

• tooling: bootstraps

What is Atlas?

BBC

PA

C4

etc...

Data ingest server DB

Update bus

ES

HTTP server

Data model

• columns to model annotations

• secondary indexes• index.direct(keyspace, SEGMENT_URI_INDEX_CF, ConsistencyLevel.CL_QUORUM).

from(segment.getCanonicalUri()). to(segment.getIdentifier()). index().execute(requestTimeout, TimeUnit.MILLISECONDS);

ID generation

• give external data our own ID on ingest

• needs to be user-friendly:http://www.radiotimes.com/programme/cf2/eastenders

• mongo: findAndModify()

• solution: uses Astyanax client with its distributed locking

• more details: http://metabroadcast.com/blog/let-cassandra-identify-your-data

http://www.radiotimes.com/programme/cf2/eastenders




http://metabroadcast.com/blog/let-cassandra-identify-your-data




Where we’re at

• already live with some data

• alpha release of schedule endpoint coming soon

• later: roll out across other endpoints

Ops in Cassandra

• we love Puppet

• it’s great for automation and deployment

• MongoDB: 1 file

• Cassandra: 2 files!

• oh... tokens

Cassandra Tokens

• define where data is written to in a cluster

• therefore balanced tokens = balanced cluster

• tokens should be rack aware

• tools available to provide appropriate tokens for you

Cassandra plays nicely with AWS

• datacentre / rack aware

• AWS Region = Datacentre

• AWS Availability Zone = Rack

• only recently introduced in MongoDB but simple to implement in Cassandra

• horizontally (and vertically) scalable

Monitoring

• Nagios is a little threadbare for Cassandra

• basic TCP service check

• stats from API not very helpful

• nodetool and CLI tools useful

• manual effort to integrate them

• if only there was some useful service...

OpsCenter

• wonderful for an overview

• not so much for alerting ;)

• ohai API

• can integrate metrics into Nagios

Disaster Recovery

• we operate a 4 node cluster presently

• replication factor of 3 with quorum read/writes

• DR complicated by tokens

• cluster should be balanced

• snapshot + S3 Backups

Cluster Happiness and Headaches

• little maintenance overhead

• cluster rebalancing

• uncommon maintenance procedure

• schema changes are cumbersome

• little scope for rollback, can put cluster in unrecoverable state

Summary

• Mongo is good, Atlas has outgrown it

• Cassandra isn’t a drop-in replacement

• Ops more complex but so far so good

Questions?

mongodb to cassandra

Technology