big data grows up - a (re)introduction to cassandra

Big Data Grows UpA (re)introduction to Cassandra

Robbie Strickland

Who am I?

Robbie StricklandSoftware Development ManagerThe Weather Channel

[email protected]@dont_use_twitter

Who am I?

● Cassandra user/contributor since 2010● … it was at release 0.5 back then● 4 years? Oracle DBA’s aren’t impressed● Done lots of dumb stuff with Cassandra● … and some really awesome stuff too

Cassandra in 2010

Cassandra in 2014

Why Cassandra?

It’s fast:

● No locks● Tunable consistency● Sequential R/W● Decentralized

Why Cassandra?

It scales (linearly):

● Multi data center● No SPOF● DHT● Hadoop integration

Why Cassandra?

It’s fault tolerant:

● Automatic replication● Masterless● Failed nodes

replaced with ease

… a lot in the last year (ish)

What’s different?

What’s new?

● Virtual nodes● O(n) data moved off-heap● CQL3 (and defining schemas)● Native protocol/driver● Collections● Lightweight transactions● Compaction throttling that actually works

What’s gone?

● Manual token management● Supercolumns● Thrift (if you use the native driver)● Directly managing storage rows

What’s still the same?

● Still not an RDBMS● Still no joins (see above)● Still no ad-hoc queries (see above again)● Still requires a denormalized data model (^^)● Still need to know what the heck you’re

doing

Linear scalability without the migraine

Token Management

The old way● 1 token per node● Assigned manually● Adding nodes ==

reassignment of all tokens

● Node rebuild heavily taxes a few nodes

A

BF

C

D

E

cluster with no vnodes

… enter Vnodes● n tokens per node● Assigned magically● Adding nodes ==

painless● Node rebuild

distributed across many nodes

A B

C

Dcluster with vnodes

N

M

L

H G

F

E

I

J

K

Node rebuild without Vnodes

Node rebuild with Vnodes

because the JVM sometimes sucks

Going Off-heap

Why go off-heap

● GC overhead● JVM no good with big heap sizes● GC overhead● GC overhead● GC overhead

O(n) data structures

● Row cache● Bloom filters● Compression offsets● Partition summary

… all these are moved off-heap

New memory allocation

native

JVM

heap

Row cacheBloom filtersCompression offsetsPartition summary

Partition key cache

Or, how to build a killer data store without a crappy interface

Death of a (Thrift) Salesman

Reasons not to ditch Thrift

● Lots of client libraries still use it● You finally got it installed● You didn’t know there was another choice● It sucks less than many alternatives

… in spite of all those benefits, you really should ditch Thrift because:

● It requires your entire result set to fit into RAM on both client and server

● The native protocol is better, faster, and supports all the new features

● Thrift-based client libraries are always a step behind

● It’s going away eventually

… and did I mention ...

It requires your entire result set to fit into RAM

on both client and server!!!

Requesting too much data

really catchy tag line here

Going Native

Native protocol

● It’s binary, making it lighter weight● It supports cursors (FTW!)● It supports prepared statements● Cluster awareness built-in● Either synchronous or asynchronous ops● Only supports CQL-based operations● Can be used side-by-side with Thrift

Native drivers

from DataStax:JavaC#Python

… other community supported drivers available

Native query exampleval insert = session.prepare("INSERT INTO myKsp.myTable (myKey, col1, col2) VALUES (?,?,?)")val select = session.prepare("SELECT * FROM myKsp.myTable WHERE myKey = ?")val cluster = Cluster.builder().addContactPoints(host1, host2, host3)val session = cluster.connect()session.execute(insert.bind(myKey, col1, col2))val result = session.execute(select.bind(myKey))

Or, how to make Cassandra more awesome while simultaneously irritating early adopters

Wait, was that SQL?!!

Introducing CQL3

● Because the first two attempts sucked● Stands for “Cassandra Query Language”● Looks a heck of a lot like SQL● … but isn’t● Substantially lowers the learning curve● … but also makes it easier to screw up● An abstraction over the storage rows

Storage rows[default@unknown] create keyspace Library;[default@unknown] use Library;[default@Library] create column family Books... with comparator=UTF8Type... and key_validation_class=UTF8Type… and default_validation_class=UTF8Type;[default@Library] set Books['Patriot Games']['author'] = 'Tom Clancy';[default@Library] set Books['Patriot Games']['year'] = '1987';[default@Library] list Books;

RowKey: Patriot Games=> (name=author, value=Tom Clancy, timestamp=1393102991499000)=> (name=year, value=1987, timestamp=1393103015955000)

Storage rows - composites[default@Library] create column family Authors... with key_validation_class=UTF8Type... and comparator='CompositeType(LongType,UTF8Type,UTF8Type)'... and default_validation_class=UTF8Type;[default@Library] set Authors['Tom Clancy']['1987:Patriot Games:publisher'] = 'Putnam';[default@Library] set Authors['Tom Clancy']['1987:Patriot Games:ISBN'] = '0-399-13241-4';[default@Library] set Authors['Tom Clancy']['1993:Without Remorse:publisher'] = 'Putnam';[default@Library] set Authors['Tom Clancy']['1993:Without Remorse:ISBN'] = '0-399-13825-0';[default@Library] list Authors;

RowKey: Tom Clancy=> (name=1987:Patriot Games:ISBN, value=0-399-13241-4, timestamp=1393104011458000)=> (name=1987:Patriot Games:publisher, value=Putnam, timestamp=1393103948577000)=> (name=1993:Without Remorse:ISBN, value=0-399-13825-0, timestamp=1393104109214000)=> (name=1993:Without Remorse:publisher, value=Putnam, timestamp=1393104083773000)

CQL - simple introcqlsh> CREATE KEYSPACE Library WITH REPLICATION = {'class':'SimpleStrategy', 'replication_factor':1};cqlsh> use Library;cqlsh:library> CREATE TABLE Books ( ... title varchar, ... author varchar, ... year int, ... PRIMARY KEY (title) ... );cqlsh:library> INSERT INTO Books (title, author, year) VALUES ('Patriot Games', 'Tom Clancy', 1987);cqlsh:library> INSERT INTO Books (title, author, year) VALUES ('Without Remorse', 'Tom Clancy', 1993);

CQL - simple intro

Storage rows:

CQL - composite keyCREATE TABLE Authors (

name varchar,year int,title varchar,publisher varchar,ISBN varchar,PRIMARY KEY (name, year, title)

)

CQL - composite key

Storage rows:

Keys and Filters

● Ad hoc queries are NOT supported● Query by key● Key must include all potential filter columns● Must include partition key in filter● Subsequent filters must be in order● Only last filter can be a range

Example - Books tableCREATE TABLE Books ( title varchar, author varchar, year int, PRIMARY KEY (title))

Example - Books tableCREATE TABLE Books ( title varchar, author varchar, year int, PRIMARY KEY (author, title))

Example - Books tableCREATE TABLE Books ( title varchar, author varchar, year int, PRIMARY KEY (author, year))

Example - Books tableCREATE TABLE Books ( title varchar, author varchar, year int, PRIMARY KEY (year, author))

Secondary Indexes

● Allows query-by-value● CREATE INDEX myIdx ON myTable (myCol)● Works well on low cardinality fields● Won’t scale for high cardinality fields● Don’t overuse it -- not a quick fix for a bad

data model

Example - Books tableCREATE TABLE Books ( title varchar, author varchar, year int, PRIMARY KEY (author))CREATE INDEX Books_year ON Books(year)

Composite Partition Keys

● PRIMARY KEY((year, author), title)● Creates a more granular shard key● Can be useful to make certain queries more

efficient, or to better distribute data● Updates sharing a partition key are atomic

and isolated

Example - Books tableCREATE TABLE Books ( title varchar, author varchar, year int, PRIMARY KEY ((year, author), title))

Example - Books tableCREATE TABLE Books ( title varchar, author varchar, year int, PRIMARY KEY (year, author, title))

denormalization done well

Collections

Supported types

● Sets - ordered naturally● Lists - ordered by index● Maps - key/value pairs

Caveats

● Max 64k items in a collection● Max 64k size per item● Collections are read in their entirety, so keep

them small

Sets

Set name

Itemvalue

Lists

List name Ordering meta data

List item value

Maps

Map name

Key Value

(tracing on)

TRON

Using tracing

● In cqlsh, “tracing on”● … enjoy!

Example1393126200000

AntipatternCREATE TABLE WorkQueue ( name varchar, time bigint, workItem varchar, PRIMARY KEY (name, time))

… do a bunch of inserts ...SELECT * FROM WorkQueue WHERE name='ToDo' ORDER BY time ASC;DELETE FROM WorkQueue WHERE name=’ToDo’ AND time=[some_time]

Antipattern - enqueue

Antipattern - dequeue

Antipattern

20k tombstones!! 13ms of 17ms spent reading tombstones

(no it’s not ACID)

Lightweight Transactions

Primer

● Supports basic Compare-and-Set ops● Provides linearizable consistency● … aka serial isolation● Uses “Paxos light” under the hood● Still expensive -- four round trips!● For most cases quorum reads/writes will be

sufficient

UsageINSERT INTO Users (login, name)VALUES (‘rs_atl’, ‘Robbie Strickland’)IF NOT EXISTS;

UPDATE UsersSET password=’super_secure_password’WHERE login=’rs_atl’IF reset_token=’some_reset_token’;

Other cool stuff

● Triggers (experimental)● Batching multiple requests● Leveled compaction● Configuration via CQL● Gossip-based rack/DC configuration

Thank you!

Robbie StricklandSoftware Development ManagerThe Weather Channel

[email protected]@dont_use_twitter

big data grows up - a (re)introduction to cassandra

Technology

utf8type default

keyspace library default

unknown use library

putnam default

utf8typeand default

storage rows default

tom clancy default

cassandra usercontributor