mongodb to cassandra

32
MongoDB to Cassandra The Atlas Odyssey Fred van den Driessche Engineer @fredvdd Tom McAdam CTO @tfm Adam Horwich Systems Engineer @Mmmkayness

Upload: fredvdd

Post on 05-Dec-2014

2.184 views

Category:

Technology


0 download

DESCRIPTION

An overview of experiences of moving from MongoDB to Cassandra from the team at metabroadcast.

TRANSCRIPT

Page 1: MongoDB to Cassandra

MongoDB to CassandraThe Atlas Odyssey

Fred van den DriesscheEngineer@fredvdd

Tom McAdamCTO

@tfm

Adam HorwichSystems Engineer@Mmmkayness

Page 2: MongoDB to Cassandra
Page 3: MongoDB to Cassandra

http://flickr.com/photos/dhammza/88644497/

Page 4: MongoDB to Cassandra

tbc

Video and audio metadata from 20+ sources

Profiles and activity from video and audio products, social networks

Our platform - late 2012

tbc

MetaBroadcast platform

Analytic requests and groupings

Page 5: MongoDB to Cassandra

?

Page 6: MongoDB to Cassandra

Main clients Main Partners

Data Partners

Page 7: MongoDB to Cassandra

What is Atlas?

ATLAS

DB

BBC

PA

C4

etc...

/content

/schedules

/topics

sitemaps

radioplayer

interlinking

Page 8: MongoDB to Cassandra

DEMO

Page 9: MongoDB to Cassandra

Atlas Data Model

brand item

series version

broadcast location

Page 10: MongoDB to Cassandra

MongoDB

• flexible

• features

• really simple

• shell

Page 11: MongoDB to Cassandra

Where MongoDB falls short

• too simple

• lack of control

• sharding

• embedding

Page 12: MongoDB to Cassandra

Where to?

Page 13: MongoDB to Cassandra

Where to?

• add a cache?

Page 14: MongoDB to Cassandra

Atlas API• content

• http://atlas.metabroadcast.com/3.0/content.json?uri=http://www.bbc.co.uk/programmes/b0074g7p&annotations=description,brand_summary,locations&apiKey=6ed2a984627daff816198acde82

• http://atlas.metabroadcast.com/3.0/content.json?apiKey=aaaa&uri=http://www.bbc.co.uk/programmes/b0074g7p&annotations=description,brand_summary,locations

• schedules

• http://atlas.metabroadcast.com/3.0/schedule.json?from=now&to=now.plus.3h&channel=bbcone&publisher=bbc.co.uk

• http://atlas.metabroadcast.com/3.0/schedule.json?from=1948-12-24&to=1948-12-25&channel=radio4&publisher=bbc.co.uk

• api explorer http://atlas.metabroadcast.com/#apiExplorer

Page 15: MongoDB to Cassandra
Page 16: MongoDB to Cassandra

Atlas API• content

• http://atlas.metabroadcast.com/3.0/content.json?uri=http://www.bbc.co.uk/programmes/b0074g7p&annotations=description,brand_summary,locations&apiKey=6ed2a984627daff816198acde82

• http://atlas.metabroadcast.com/3.0/content.json?apiKey=aaaa&uri=http://www.bbc.co.uk/programmes/b0074g7p&annotations=description,brand_summary,locations

• schedules

• http://atlas.metabroadcast.com/3.0/schedule.json?from=now&to=now.plus.3h&channel=bbcone&publisher=bbc.co.uk

• http://atlas.metabroadcast.com/3.0/schedule.json?from=1948-12-24&to=1948-12-25&channel=radio4&publisher=bbc.co.uk

• api explorer http://atlas.metabroadcast.com/#apiExplorer

Page 17: MongoDB to Cassandra

Why Cassandra?

• scalability/performance

• row caches

• consistency control

• column-based model matches our use case

Page 18: MongoDB to Cassandra

And?

• ElasticSearch

• messaging

• tooling: bootstraps

Page 19: MongoDB to Cassandra

What is Atlas?

BBC

PA

C4

etc...

Data ingest server DB

Update bus

ES

HTTP server

Page 20: MongoDB to Cassandra

Data model

• columns to model annotations

• secondary indexes• index.direct(keyspace, SEGMENT_URI_INDEX_CF, ConsistencyLevel.CL_QUORUM).

from(segment.getCanonicalUri()). to(segment.getIdentifier()). index().execute(requestTimeout, TimeUnit.MILLISECONDS);

Page 21: MongoDB to Cassandra

ID generation

• give external data our own ID on ingest

• needs to be user-friendly:http://www.radiotimes.com/programme/cf2/eastenders

• mongo: findAndModify()

• solution: uses Astyanax client with its distributed locking

• more details: http://metabroadcast.com/blog/let-cassandra-identify-your-data

Page 22: MongoDB to Cassandra

Where we’re at

• already live with some data

• alpha release of schedule endpoint coming soon

• later: roll out across other endpoints

Page 23: MongoDB to Cassandra

Ops

Page 24: MongoDB to Cassandra

Ops in Cassandra

• we love Puppet

• it’s great for automation and deployment

• MongoDB: 1 file

• Cassandra: 2 files!

• oh... tokens

Page 25: MongoDB to Cassandra

Cassandra Tokens

• define where data is written to in a cluster

• therefore balanced tokens = balanced cluster

• tokens should be rack aware

• tools available to provide appropriate tokens for you

Page 26: MongoDB to Cassandra

Cassandra plays nicely with AWS

• datacentre / rack aware

• AWS Region = Datacentre

• AWS Availability Zone = Rack

• only recently introduced in MongoDB but simple to implement in Cassandra

• horizontally (and vertically) scalable

Page 27: MongoDB to Cassandra

Monitoring

• Nagios is a little threadbare for Cassandra

• basic TCP service check

• stats from API not very helpful

• nodetool and CLI tools useful

• manual effort to integrate them

• if only there was some useful service...

Page 28: MongoDB to Cassandra

OpsCenter

• wonderful for an overview

• not so much for alerting ;)

• ohai API

• can integrate metrics into Nagios

Page 29: MongoDB to Cassandra

Disaster Recovery

• we operate a 4 node cluster presently

• replication factor of 3 with quorum read/writes

• DR complicated by tokens

• cluster should be balanced

• snapshot + S3 Backups

Page 30: MongoDB to Cassandra

Cluster Happiness and Headaches

• little maintenance overhead

• cluster rebalancing

• uncommon maintenance procedure

• schema changes are cumbersome

• little scope for rollback, can put cluster in unrecoverable state

Page 31: MongoDB to Cassandra

Summary

• Mongo is good, Atlas has outgrown it

• Cassandra isn’t a drop-in replacement

• Ops more complex but so far so good

Page 32: MongoDB to Cassandra

Questions?