stampede con 2014 cassandra in the real world

Post on 30-Oct-2014

494 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Three use cases of Apache Cassandra in real-world implementations and the best practices distilled from such.

TRANSCRIPT

STAMPEDECON 2014

CASSANDRA IN THE REAL WORLD

Nate McCall @zznate

!

Co-Founder & Sr. Technical Consultant !

Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License

About The Last Pickle. !

Work with clients to deliver and improve Apache Cassandra based solutions.

!

Based in New Zealand & USA.

“…in the Real World?” !

Lots of hype, stats get attention,

as do big names

“Real World?” !

“…1.1 million client writes per second. Data was automatically replicated across all three zones making a total of 3.3 million writes per second across the cluster.”

http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html

“Real World?” !

“+10 clusters, +100s nodes, 250TB provisioned,

9 billion writes/day, 5 billion reads/day”

http://www.slideshare.net/jaykumarpatel/cassandra-at-ebay-cassandra-summit-2013

“Real World?” !

… • “but I don’t have an∞ AMZN budget” • “maybe one day I’ll have that much data”

“Real World!” !

Most folks needed: real fault tolerance,

scale out characteristics

“Real World!” !

Most folks have: 3 to 12 nodes with 2-15TB,

commodity hardware, small teams

!

Cassandra at 10k feet Case Studies

Common Best Practices

Cassandra in the Real World.

Cassandra Architecture (briefly).

API's

Cluster Aware

Cluster Unaware

Clients

Disk

Cassandra Cluster Architecture (briefly).

API's

Cluster Aware

Cluster Unaware

Clients

Disk

API's

Cluster Aware

Cluster Unaware

Disk

Node 1 Node 2

Dynamo Cluster Architecture (briefly).

API's

Dynamo

Database

Clients

Disk

API's

Dynamo

Database

Disk

Node 1 Node 2

Cassandra Architecture (briefly). !

API Dynamo Database

API Transports. !

Thrift Native Binary

Thrift transport. !

Extremely performant for specific workloads

Astyanax, disruptor-based HSHA in 2.0

API Transports. !

Thrift Native Binary

Native Binary Transport. !

Focus of future development Uses Netty, CQL 3 only,

asynchronous

API Services. !

JMX Thrift

CQL 3 !

API Services. !

JMX Thrift CQL 3

!

API Services. !

JMX Thrift

CQL 3 !

Cassandra Architecture (briefly). !

API Dynamo Database

Please see: http://www.slideshare.net/aaronmorton/cassandra-community-webinar-introduction-to-apache-cassandra-12-20353118 http://www.slideshare.net/planetcassandra/c-summit-eu-2013-cassandra-internals http://www.slideshare.net/aaronmorton/cassandra-community-webinar-august-29th-2013-in-case-of-emergency-break-glass

Cassandra in the Real World. !

Cassandra at 10k feet Case Studies

Common Best Practices

Case Studies.

Ad Tech Sensor Data

Mobile Device Diagnostics

Ad Tech.

Latency = $$$

Ad Tech.

Large “Hot Data” set active users,

targeting, display count

Ad Tech.

Huge Long Tail who saw what, used for billing,

campaign effectiveness over time, all sorts of analytics

Ad Tech: Software.

Java CQL via DataStax Java Driver

Python Pycassa (Thrift)

Ad Tech: Cluster.

Cluster 12 nodes,

2 datacenters, {DC1:R1:3,DC2:R2:3}

Ad Tech: Systems.

Physical Hardware commodity 1U 8xSSD,

36GB RAM, 10gigE + 4x1gigE

Case Studies.

Ad Tech Sensor Data

Mobile Device Diagnostics

Sensor Data.

Latency != $$$

Sensor Data.

High Write Throughput: consistent “shape”,

immutable data, large sequential reads,

high uptime (for writes)

Sensor Data: Software.

REST application: separate reader service,

writes to kafka, ELB to multiple regions

Sensor Data: Software.

Java: Thrift via Astyanax,

read from kafka and batch insertions to optimal size

Sensor Data: Cluster.

Cluster 9 nodes,

1 availability zone, {RF:3}

Sensor Data: Systems.

m1.xlarge: 15GB, 2TB RAID0

“high”, tablesnap for backup

Case Studies.

Ad Tech Sensor Data

Mobile Device Diagnostics

Device Diagnostics.

Latency = battery

Device Diagnostics.

Write Bursts large single payloads,

large hot data set

Device Diagnostics.

Huge long tail but irrelevant after 2 months,

external partner API* !

*thar be dragons

Device Diagnostics: Software.

Java CQL / DataStax Java Driver

Device Diagnostics: Software.

REST application Payloads to S3,

pointer in kafka to payload

Device Diagnostics: Cluster.

Cluster 12 nodes,

3 availability zones {us-east-1:1}

Device Diagnostics: Systems.

i2.2xlarge 61gb, 1.8TB RAID0 SSD “Enhanced Networking”,

dedicated ENI

Device Diagnostics: Systems.

No Backups. !

!

Device Diagnostics: Systems.

No Backups. !

“Replay the front end.”

Cassandra in the Real World. !

Cassandra at 10k feet Case Studies

Common Best Practices

Common Best Practices.

API's

Cluster Aware

Cluster Unaware

Clients

Disk

Client Best Practices.

Decouple! buffer writes for

event based systems, use asynchronous operations

Client Best Practices.

Use Official Drivers (but there are exceptions)

Client Best Practices.

CQL3: collections,

user defined types, tooling available

Common Best Practices.

API's

Cluster Aware

Cluster Unaware

Clients

Disk

API Best Practices.

Understand Replication!

API Best Practices.

Monitor & Instrument

Common Best Practices.

API's

Cluster Aware

Cluster Unaware

Clients

Disk

Cluster Best Practices.

Understand Replication! learn all you can about

topology options

Cluster Best Practices.

Verify Assumptions: test failure scenarios explicitly

Common Best Practices.

API's

Cluster Aware

Cluster Unaware

Clients

Disk

Systems Best Practices.

Better to have a lot of a little commodity hardware*,

32-64gb or RAM (or more)

*10gigE is now commodity

Systems Best Practices.

BUT: do you have staff that can tune kernels?

larger hardware needs tuning: “receive packet steering”

Systems Best Practices.

EC2 SSD instances if you can,

Use VPCs, Deployment groups and ENIs

Common Best Practices.

API's

Cluster Aware

Cluster Unaware

Clients

Disk

Storage Best Practices.

Dependent on workload can mix and match:

rotational for commitlog and system

Storage Best Practices.

You can mix and match: rotational for commitlog and

system, SSD for data

Storage Best Practices.

SSD consider JBOD,

consumer grade works fine

Storage Best Practices.

“What about SANs?”

Storage Best Practices.

“What about SANs?” !

NO. !

(You would be moving a distributed system onto a centralized component)

Storage Best Practices.

Backups: tablesnap on EC2,

rsync (immutable data FTW!)

Storage Best Practices.

Backups: combine rebuild+replay for

best results (Bonus: loading production data to staging is

testing your backups!)

Thanks. !

Nate McCall @zznate

!

Co-Founder & Sr. Technical Consultant www.thelastpickle.com

top related