cassandra at no_sql
DESCRIPTION
SFJava, SFNoSQL, SFMySQL, Marakana & Microsoft come together for a presentation evening of three NoSQL technologies - Apache Cassandra, Mongodb, Hadoop. This talk lays out a few talking points for Apache Cassandra.TRANSCRIPT
Apache Cassandra: NoSQL, Yes to Scale!
srisatish ambati@srisatish
NoSQL -Know your queries.
points
• Usecases• Why cassandra?• Usecase: Hadoop, Brisk• FUD: Consistency • Why facebook is not using Cassandra?• Community, Code, Tools• Q&A
Users. Netflix.Key by Customer, read-heavyKey by Customer:Movie, write-heavy
TimeSeries: (several customers)periodic readings: dev0, dev1…deviceID:metric:timestamp ->value
Metrics typically way larger dataset than users.
Why Cassandra?
Operational simplicity peer-to-peer
Operational simplicity peer-to-peer
Replication: Multi-datacenterMulti-region ec2Multi-availability zones
Replication: Multi-datacenterMulti-region ec2, awsMulti-availability zones
dc1 dc2
reads local
“Movie marathons on Netflix awaiting AWS to come back up.” #ec2disabled
4.21.2011, Amazon Web Services outage:
Netflix was running on AWS.
4.21.2011, Amazon Web Services outage:
fast durable writes. fast reads.
Writes Sequential, append-only.~1-5ms
Reads Local Key & row caches, (also, jna-based 0xffheap) indexes, materialized
Clients: cql, thrift pycassa, phpcassa hector, pelops (scala, ruby, clojure)
Usecase #3: hadoopHdfs cassandra hiveLogs stats analytics
BriskTruly peer-to-peer hadoop.
Namenode decomposition, explained.
Use column families (tables)inodesblock
near-real time hadoopLow latency: cassandra_dc nodesBatch Analytics: brisk_dc nodes
FUD, acronym: fear, uncertainty, doubt.
Consistency: R + W > N ORACLE, 2-node: R=1, W=2, N=2,(T=2)DNS
* N is replication factor. Not to be confused with T=total #of nodes
Tune-able, flexibility.For High Consistency:
read:quorum, write:quorumFor High Availability:
high W, low R.
Inbox Search: 600+cores.120+TB (2008)Went from 100-500m users.
Average NoSQL deployment size: ~6-12 nodes.
Usecase #5: searchApache Solr + Cassandra = Solandra
Other inbox/file Searches:xobni, c3
github.com/tjake/solandra
“Eventual consistency is harder to program.”mostly immutable data.complex systems at scale.
Miscellaneous, Myth: data-loss, partial rows.writes are durable.
Three more reasons for Cassandra...
ToolsAMIs, OpsCenter, DataStaxAppDynamics
B e a u t i f u l C 0 d e
= new code(); //less is more~90k.java.concurrent.@annotate. bloomfilters, merkletrees.non-blocking, staged-event-driven.bigtable, dynamo.
Current & Future Focus:Distributed Counters, CQL.Simple client.operational smoothening.
compaction.
CommunityRobust. Rapid. #Professional support from DataStax.
engineers: independent,startups, large companies, Rackspace, Twitter, Netflix..
Come join the efforts!
Usecase #4: first NoSQL, then scale!simpledb Cassandra mongodb Cassandra
Copyright: xkcd
Copyright: plantoys
… more than one way to do it!
Summary -high scale peer-to-peer distributed database.
Q&A@srisatish