stampede con 2014 cassandra in the real world

71
STAMPEDECON 2014 CASSANDRA IN THE REAL WORLD Nate McCall @zznate Co-Founder & Sr. Technical Consultant Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License

Upload: zznate

Post on 30-Oct-2014

494 views

Category:

Technology


0 download

DESCRIPTION

Three use cases of Apache Cassandra in real-world implementations and the best practices distilled from such.

TRANSCRIPT

Page 1: Stampede con 2014   cassandra in the real world

STAMPEDECON 2014

CASSANDRA IN THE REAL WORLD

Nate McCall @zznate

!

Co-Founder & Sr. Technical Consultant !

Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License

Page 2: Stampede con 2014   cassandra in the real world

About The Last Pickle. !

Work with clients to deliver and improve Apache Cassandra based solutions.

!

Based in New Zealand & USA.

Page 3: Stampede con 2014   cassandra in the real world

“…in the Real World?” !

Lots of hype, stats get attention,

as do big names

Page 4: Stampede con 2014   cassandra in the real world

“Real World?” !

“…1.1 million client writes per second. Data was automatically replicated across all three zones making a total of 3.3 million writes per second across the cluster.”

http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html

Page 5: Stampede con 2014   cassandra in the real world

“Real World?” !

“+10 clusters, +100s nodes, 250TB provisioned,

9 billion writes/day, 5 billion reads/day”

http://www.slideshare.net/jaykumarpatel/cassandra-at-ebay-cassandra-summit-2013

Page 6: Stampede con 2014   cassandra in the real world

“Real World?” !

… • “but I don’t have an∞ AMZN budget” • “maybe one day I’ll have that much data”

Page 7: Stampede con 2014   cassandra in the real world

“Real World!” !

Most folks needed: real fault tolerance,

scale out characteristics

Page 8: Stampede con 2014   cassandra in the real world

“Real World!” !

Most folks have: 3 to 12 nodes with 2-15TB,

commodity hardware, small teams

Page 9: Stampede con 2014   cassandra in the real world

!

Cassandra at 10k feet Case Studies

Common Best Practices

Cassandra in the Real World.

Page 10: Stampede con 2014   cassandra in the real world

Cassandra Architecture (briefly).

API's

Cluster Aware

Cluster Unaware

Clients

Disk

Page 11: Stampede con 2014   cassandra in the real world

Cassandra Cluster Architecture (briefly).

API's

Cluster Aware

Cluster Unaware

Clients

Disk

API's

Cluster Aware

Cluster Unaware

Disk

Node 1 Node 2

Page 12: Stampede con 2014   cassandra in the real world

Dynamo Cluster Architecture (briefly).

API's

Dynamo

Database

Clients

Disk

API's

Dynamo

Database

Disk

Node 1 Node 2

Page 13: Stampede con 2014   cassandra in the real world

Cassandra Architecture (briefly). !

API Dynamo Database

Page 14: Stampede con 2014   cassandra in the real world

API Transports. !

Thrift Native Binary

Page 15: Stampede con 2014   cassandra in the real world

Thrift transport. !

Extremely performant for specific workloads

Astyanax, disruptor-based HSHA in 2.0

Page 16: Stampede con 2014   cassandra in the real world

API Transports. !

Thrift Native Binary

Page 17: Stampede con 2014   cassandra in the real world

Native Binary Transport. !

Focus of future development Uses Netty, CQL 3 only,

asynchronous

Page 18: Stampede con 2014   cassandra in the real world

API Services. !

JMX Thrift

CQL 3 !

Page 19: Stampede con 2014   cassandra in the real world

API Services. !

JMX Thrift CQL 3

!

Page 20: Stampede con 2014   cassandra in the real world

API Services. !

JMX Thrift

CQL 3 !

Page 21: Stampede con 2014   cassandra in the real world

Cassandra Architecture (briefly). !

API Dynamo Database

Please see: http://www.slideshare.net/aaronmorton/cassandra-community-webinar-introduction-to-apache-cassandra-12-20353118 http://www.slideshare.net/planetcassandra/c-summit-eu-2013-cassandra-internals http://www.slideshare.net/aaronmorton/cassandra-community-webinar-august-29th-2013-in-case-of-emergency-break-glass

Page 22: Stampede con 2014   cassandra in the real world

Cassandra in the Real World. !

Cassandra at 10k feet Case Studies

Common Best Practices

Page 23: Stampede con 2014   cassandra in the real world

Case Studies.

Ad Tech Sensor Data

Mobile Device Diagnostics

Page 24: Stampede con 2014   cassandra in the real world

Ad Tech.

Latency = $$$

Page 25: Stampede con 2014   cassandra in the real world

Ad Tech.

Large “Hot Data” set active users,

targeting, display count

Page 26: Stampede con 2014   cassandra in the real world

Ad Tech.

Huge Long Tail who saw what, used for billing,

campaign effectiveness over time, all sorts of analytics

Page 27: Stampede con 2014   cassandra in the real world

Ad Tech: Software.

Java CQL via DataStax Java Driver

Python Pycassa (Thrift)

Page 28: Stampede con 2014   cassandra in the real world

Ad Tech: Cluster.

Cluster 12 nodes,

2 datacenters, {DC1:R1:3,DC2:R2:3}

Page 29: Stampede con 2014   cassandra in the real world

Ad Tech: Systems.

Physical Hardware commodity 1U 8xSSD,

36GB RAM, 10gigE + 4x1gigE

Page 30: Stampede con 2014   cassandra in the real world

Case Studies.

Ad Tech Sensor Data

Mobile Device Diagnostics

Page 31: Stampede con 2014   cassandra in the real world

Sensor Data.

Latency != $$$

Page 32: Stampede con 2014   cassandra in the real world

Sensor Data.

High Write Throughput: consistent “shape”,

immutable data, large sequential reads,

high uptime (for writes)

Page 33: Stampede con 2014   cassandra in the real world

Sensor Data: Software.

REST application: separate reader service,

writes to kafka, ELB to multiple regions

Page 34: Stampede con 2014   cassandra in the real world

Sensor Data: Software.

Java: Thrift via Astyanax,

read from kafka and batch insertions to optimal size

Page 35: Stampede con 2014   cassandra in the real world

Sensor Data: Cluster.

Cluster 9 nodes,

1 availability zone, {RF:3}

Page 36: Stampede con 2014   cassandra in the real world

Sensor Data: Systems.

m1.xlarge: 15GB, 2TB RAID0

“high”, tablesnap for backup

Page 37: Stampede con 2014   cassandra in the real world

Case Studies.

Ad Tech Sensor Data

Mobile Device Diagnostics

Page 38: Stampede con 2014   cassandra in the real world

Device Diagnostics.

Latency = battery

Page 39: Stampede con 2014   cassandra in the real world

Device Diagnostics.

Write Bursts large single payloads,

large hot data set

Page 40: Stampede con 2014   cassandra in the real world

Device Diagnostics.

Huge long tail but irrelevant after 2 months,

external partner API* !

*thar be dragons

Page 41: Stampede con 2014   cassandra in the real world

Device Diagnostics: Software.

Java CQL / DataStax Java Driver

Page 42: Stampede con 2014   cassandra in the real world

Device Diagnostics: Software.

REST application Payloads to S3,

pointer in kafka to payload

Page 43: Stampede con 2014   cassandra in the real world

Device Diagnostics: Cluster.

Cluster 12 nodes,

3 availability zones {us-east-1:1}

Page 44: Stampede con 2014   cassandra in the real world

Device Diagnostics: Systems.

i2.2xlarge 61gb, 1.8TB RAID0 SSD “Enhanced Networking”,

dedicated ENI

Page 45: Stampede con 2014   cassandra in the real world

Device Diagnostics: Systems.

No Backups. !

!

Page 46: Stampede con 2014   cassandra in the real world

Device Diagnostics: Systems.

No Backups. !

“Replay the front end.”

Page 47: Stampede con 2014   cassandra in the real world

Cassandra in the Real World. !

Cassandra at 10k feet Case Studies

Common Best Practices

Page 48: Stampede con 2014   cassandra in the real world

Common Best Practices.

API's

Cluster Aware

Cluster Unaware

Clients

Disk

Page 49: Stampede con 2014   cassandra in the real world

Client Best Practices.

Decouple! buffer writes for

event based systems, use asynchronous operations

Page 50: Stampede con 2014   cassandra in the real world

Client Best Practices.

Use Official Drivers (but there are exceptions)

Page 51: Stampede con 2014   cassandra in the real world

Client Best Practices.

CQL3: collections,

user defined types, tooling available

Page 52: Stampede con 2014   cassandra in the real world

Common Best Practices.

API's

Cluster Aware

Cluster Unaware

Clients

Disk

Page 53: Stampede con 2014   cassandra in the real world

API Best Practices.

Understand Replication!

Page 54: Stampede con 2014   cassandra in the real world

API Best Practices.

Monitor & Instrument

Page 55: Stampede con 2014   cassandra in the real world

Common Best Practices.

API's

Cluster Aware

Cluster Unaware

Clients

Disk

Page 56: Stampede con 2014   cassandra in the real world

Cluster Best Practices.

Understand Replication! learn all you can about

topology options

Page 57: Stampede con 2014   cassandra in the real world

Cluster Best Practices.

Verify Assumptions: test failure scenarios explicitly

Page 58: Stampede con 2014   cassandra in the real world

Common Best Practices.

API's

Cluster Aware

Cluster Unaware

Clients

Disk

Page 59: Stampede con 2014   cassandra in the real world

Systems Best Practices.

Better to have a lot of a little commodity hardware*,

32-64gb or RAM (or more)

*10gigE is now commodity

Page 60: Stampede con 2014   cassandra in the real world

Systems Best Practices.

BUT: do you have staff that can tune kernels?

larger hardware needs tuning: “receive packet steering”

Page 61: Stampede con 2014   cassandra in the real world

Systems Best Practices.

EC2 SSD instances if you can,

Use VPCs, Deployment groups and ENIs

Page 62: Stampede con 2014   cassandra in the real world

Common Best Practices.

API's

Cluster Aware

Cluster Unaware

Clients

Disk

Page 63: Stampede con 2014   cassandra in the real world

Storage Best Practices.

Dependent on workload can mix and match:

rotational for commitlog and system

Page 64: Stampede con 2014   cassandra in the real world

Storage Best Practices.

You can mix and match: rotational for commitlog and

system, SSD for data

Page 65: Stampede con 2014   cassandra in the real world

Storage Best Practices.

SSD consider JBOD,

consumer grade works fine

Page 66: Stampede con 2014   cassandra in the real world

Storage Best Practices.

“What about SANs?”

Page 67: Stampede con 2014   cassandra in the real world

Storage Best Practices.

“What about SANs?” !

NO. !

(You would be moving a distributed system onto a centralized component)

Page 68: Stampede con 2014   cassandra in the real world

Storage Best Practices.

Backups: tablesnap on EC2,

rsync (immutable data FTW!)

Page 69: Stampede con 2014   cassandra in the real world

Storage Best Practices.

Backups: combine rebuild+replay for

best results (Bonus: loading production data to staging is

testing your backups!)

Page 70: Stampede con 2014   cassandra in the real world

Thanks. !

Page 71: Stampede con 2014   cassandra in the real world

Nate McCall @zznate

!

Co-Founder & Sr. Technical Consultant www.thelastpickle.com