cassandra at no_sql

Apache Cassandra: NoSQL, Yes to Scale!

srisatish ambati@srisatish

NoSQL -Know your queries.

points

• Usecases• Why cassandra?• Usecase: Hadoop, Brisk• FUD: Consistency • Why facebook is not using Cassandra?• Community, Code, Tools• Q&A

Users. Netflix.Key by Customer, read-heavyKey by Customer:Movie, write-heavy

TimeSeries: (several customers)periodic readings: dev0, dev1…deviceID:metric:timestamp ->value

Metrics typically way larger dataset than users.

Why Cassandra?

Operational simplicity peer-to-peer

Replication: Multi-datacenterMulti-region ec2Multi-availability zones

Replication: Multi-datacenterMulti-region ec2, awsMulti-availability zones

dc1 dc2

reads local

“Movie marathons on Netflix awaiting AWS to come back up.” #ec2disabled

4.21.2011, Amazon Web Services outage:

Netflix was running on AWS.

4.21.2011, Amazon Web Services outage:

fast durable writes. fast reads.

Writes Sequential, append-only.~1-5ms

Reads Local Key & row caches, (also, jna-based 0xffheap) indexes, materialized

Clients: cql, thrift pycassa, phpcassa hector, pelops (scala, ruby, clojure)

Usecase #3: hadoopHdfs cassandra hiveLogs stats analytics

BriskTruly peer-to-peer hadoop.

Namenode decomposition, explained.

Use column families (tables)inodesblock

near-real time hadoopLow latency: cassandra_dc nodesBatch Analytics: brisk_dc nodes

FUD, acronym: fear, uncertainty, doubt.

Consistency: R + W > N ORACLE, 2-node: R=1, W=2, N=2,(T=2)DNS

* N is replication factor. Not to be confused with T=total #of nodes

Tune-able, flexibility.For High Consistency:

read:quorum, write:quorumFor High Availability:

high W, low R.

Inbox Search: 600+cores.120+TB (2008)Went from 100-500m users.

Average NoSQL deployment size: ~6-12 nodes.

Usecase #5: searchApache Solr + Cassandra = Solandra

Other inbox/file Searches:xobni, c3

github.com/tjake/solandra

“Eventual consistency is harder to program.”mostly immutable data.complex systems at scale.

Miscellaneous, Myth: data-loss, partial rows.writes are durable.

Three more reasons for Cassandra...

ToolsAMIs, OpsCenter, DataStaxAppDynamics

B e a u t i f u l C 0 d e

= new code(); //less is more~90k.java.concurrent.@annotate. bloomfilters, merkletrees.non-blocking, staged-event-driven.bigtable, dynamo.

Current & Future Focus:Distributed Counters, CQL.Simple client.operational smoothening.

compaction.

CommunityRobust. Rapid. #Professional support from DataStax.

engineers: independent,startups, large companies, Rackspace, Twitter, Netflix..

Come join the efforts!

Usecase #4: first NoSQL, then scale!simpledb Cassandra mongodb Cassandra

Copyright: xkcd

Copyright: plantoys

… more than one way to do it!

Summary -high scale peer-to-peer distributed database.

Q&A@srisatish

cassandra at no_sql

pointsusecaseswhy cassandra

apache cassandra

high consistency

peer hadoop

summary high scale peer

high w

brisktruly peer

peer distributed database

Technology

cassandra at ebay - cassandra summit 2013

apache cassandra at target - cassandra summit 2014

cassandra operations at netflix

cassandra summit 2014: apache cassandra at telefonica cbs

cabs, cassandra, and hailo (at cassandra eu)

cassandra at wize commerce

apache cassandra at narmal 2014

cassandra presentation at nosql

cassandra introduction at finishjug

cassandra summit 2014: cassandra at instagram 2014

hadoop and cassandra at rackspace

cassandra upgrades at scale

apache cassandra at macys

real-world cassandra at sharethis

cassandra at twitter

cassandra at zalando

cassandra at instagram (august 2013)

cassandra day chicago 2015: the evolution of apache...

cassandra at umbel

cassandra at bazaarvoice - emodb