cassandra community webinar: from mongo to cassandra, architectural lessons

24
MONGODB TO CASSANDRA ARCHITECTURAL LESSONS Jon Hadad & Blake Eggleston

Upload: datastax

Post on 26-Jan-2015

107 views

Category:

Technology


1 download

DESCRIPTION

We'll be covering some aspects of our architecture, highlighting differences between MongoDB and Cassandra. We'll go in depth to explain why Cassandra is a better choice for our general purpose Application Platform (SHIFT) as well as our Media Buying Analytics tool (the SHIFT Media Manager). We'll be going over common design patterns people might be familiar with coming from a background with MongoDB and highlight how Cassandra would be used as a better alternative. We'll also touch more on cqlengine which is nearing feature completeness as the Cassandra object mapper for Python.

TRANSCRIPT

Page 1: Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons

MONGODB TO CASSANDRA ARCHITECTURAL LESSONS !

Jon Hadad & Blake Eggleston

Page 2: Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons

Overview

Differences in DB Architectures !

SHIFT Platform !

SHIFT Media Manager !

Intro to cqlengine

Page 3: Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons

MongoDB Architecture

Important Concepts • replica set (master / slave) • shard (replica set within a cluster) • config server (topology) • mongos (router) • Shard key is an indexed field that

determines the shard a particular document belongs to

!

sources: http://docs.mongodb.org/manual/core/sharded-cluster-architectures-production/, http://docs.mongodb.org/manual/core/sharding-shard-key/

Page 4: Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons

Cassandra Architecture

• Only 1 type of server (Cassandra) • Ring Based Replication (no master

or slave) • No single point of failure • Key hashes to a location in the ring • Replication Factor (RF=3) • Limited query flexibility (always

select by key) • Each query has a consistency level

source: http://developer.rackspace.com/images/2013-03-27-rackspace-service-registry-status-update/vnodes.png

Page 5: Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons

Cassandra Storage

source: http://developer.rackspace.com/images/2013-03-27-rackspace-service-registry-status-update/vnodes.png

• SSTables are immutable • Each column includes a timestamp of when it was written • The same column can exist for a given key in multiple

SSTables • Deletes are written as tombstones • SSTables are periodically merged (compaction) • Compaction keeps the column with the latest timestamp

on conflicts

Page 6: Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons

Cassandra Writes

• Writes are written to any node in the cluster (the coordinator) which figures out where it should go

• Writes are saved in memory to a “memtable”, and written to a commit log.

• Memtables are flushed to disk periodically as SSTables. source: http://www.datastax.com/docs/_images/write_access.png

Page 7: Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons

Cassandra Reads

• Any server may be queried • Acts as coordinator • Data is pulled from SSTables and

merged • Contacts nodes with the

requested key • Performs read repair if necessary • Reads are a more time consuming

operation than writes. source: http://www.datastax.com/docs/_images/write_access.png

Page 8: Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons

MongoDB Advantages

• Very Flexible Documents

• Very Flexible Queries

• Full text search (2.4)

• Aggregation Framework

• Geospatial Indexes / Queries

• Really good documentation

Page 9: Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons

MongoDB Pitfalls• Many queries will route to entire

cluster !

• Overwriting documents / changing doc sizes causes memory fragmentation problems (db repair)

!• Query language is awkward for

humans !• Queries that go to disk pay an

enormous penalty !• Max size of 256GB per collection source: https://blog.serverdensity.com/map-reduce-and-mongodb/

Page 10: Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons

Cassandra Advantages

• Multi data center aware & reliable • Fewer moving parts • No DB / table locking • Unbelievable with time series data (stats) • Performance scales linearly as you add servers • Optimized compaction options for traditional spinning

disks and SSDs • Lots of control over how your data is stored on disk.

Page 11: Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons

Cassandra Pitfalls

• Secondary Indexes have hidden costs • Individual reads (single rows) are not as fast as other DBs • JVM can be intimidating (GC) • Data modeling requires more planning • Generally need to construct a table per query you intend on

running • Ad hoc queries or queries with lots of permutations can be

very difficult to model • We complement Cassandra with Elastic Search for these types

of queries (also Solr & DS Enterprise are good choices)

Page 12: Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons

Media Manager Social Analytics

Page 13: Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons

What is Media Manager?• Ad buying and management tool for Facebook, Twitter

• We sync ~2 billion ad stats a month

• We roll up stats at multiple levels in real time

• 10 node C* cluster, AWS high I/O

• Peaked at 150K queries / second

• Approx 150GB of data, growing 10% / week

Page 14: Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons

Real time Rollups

• A single row per parent object type & date

• For any object (teams, folders, campaign) we can perform a rollup for a given date by accessing only a single row. This limits our I/O and is extremely efficient.

• New ad stats are propagated up immediately in rollups with very few reads.

campaign+date

ad1 ad2 ad3

stats stats stats

folder+datecampaign1 campaign2 campaign3

stats stats stats

rollup

Page 15: Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons

Why Cassandra?

• Almost our entire DB is in our working set.

• We have rows on disk that are inconsistently sized, so heuristics on doc size for preallocation are not useful.

• We could not tolerate unpredictable query behavior due to disk access.

Page 16: Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons

SHIFT.com Collaboration Platform

Page 17: Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons

Real time Collaboration

• Build for Marketers

• Allows communication across departments and organizations

• 3rd Party Applications

Page 18: Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons

Messaging

• Messages are fanned out to an entire team

• Teams may have hundreds of members

• Each member has perspectival view of their messages and their own metadata on those messages (tags & unread)

Page 19: Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons

Message Inbox

user timeuuid1 timeuuid2 timeuuid3

jon msg1 msg2 msg3

blake msg3 msg1 msg2

• When a message is sent or replied to, we use insert a record with a timeuuid into a persons stream which points to the message.

• Timeuuids are stored on disk in reverse order of the embedded timestamp

• We can easily query the row for the first N items in the users inbox

• We store multiple views as tags for each user to quickly surface messages in different contexts.

Page 20: Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons

CQLENGINE python CQL3 mapper

Page 21: Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons

cqlengine features• CQL3 Object Mapper for Python • Supports Cassandra 1.2 • Builds queries supporting the following: • TTLs • Per Query Consistency • Blind Table Updates • Batch Queries • Counters • Maps, sets, lists

• Schema management • Per table compaction settings • Table Polymorphism

Page 22: Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons

Table Polymorphism• In a single table we can have heterogenous objects • We use this on Media Manager for Ad types

campaign ad type

1 1 page_post

1 2 mobile_ad

1 3 application_ad

Page 23: Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons

Upcoming Features• Work seamlessly with multiple clusters

• Native driver integration

• Key cache / row cache configuration

• Cassandra 2.0 features

• Third party plugins • session • flask • identity map

Page 24: Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons

THANK YOU

PALO ALTO 650.804.8319

NEW YORK 646.649.2972

CHICAGO 312.465.2152

www.shift.com

SANTA MONICA 310.310.8315

Jon [email protected]

@rustyrazorblade

Blake [email protected]

@beggleston