devops kc

39
©2014 DataStax. Do not distribute without consent. ©2014 DataStax. Do not distribute without consent. DataStax Philip Thompson Software Engineer Apache Cassandra

Upload: philip-thompson

Post on 30-Jun-2015

178 views

Category:

Software


0 download

DESCRIPTION

Devops KC presentation on Apache Cassandra

TRANSCRIPT

Page 1: Devops kc

©2014 DataStax. Do not distribute without consent.©2014 DataStax. Do not distribute without consent.

DataStax

Philip ThompsonSoftware Engineer

Apache Cassandra

Page 2: Devops kc

Who I am• Philip Thompson

• Software Engineer at DataStax

• Contributor to Apache Cassandra

• A maintainer of CCM, the Cassandra Cluster Manager

Page 3: Devops kc

Apache Cassandra™

•Apache Cassandra™ is a massively scalable, open source, NoSQL, distributed database built for modern, mission-critical online applications.

•Written in Java and is a hybrid of Amazon Dynamo and Google BigTable•Masterless with no single point of failure•Distributed and data centre aware•100% uptime•Predictable scaling

Page 4: Devops kc

©2012 DataStax

Page 5: Devops kc

©2012 DataStax

Page 6: Devops kc

©2012 DataStax

Page 7: Devops kc

©2012 DataStax 9

http://techblog.netflix.com/2012/07/lessons-netflix-learned-from-aws-storm.html

Page 8: Devops kc

©2012 DataStax

Cluster Architecture

Page 9: Devops kc

Data Distribution

75

0

25

50Murmur3_Hash_Function(Partition Key) >>

Token

Page 10: Devops kc

Cassandra - More than one server

• All nodes participate in a cluster

• Shared nothing

• Add or remove as needed

• More capacity? Add a server

• Each node owns a number of tokens

• Tokens denote a range of keys

• 4 nodes? -> Key range/4• Each node owns 1/4 the data

Page 11: Devops kc

Cassandra - Locally Distributed

• Client writes to any node

• Node coordinates with others

• Data replicated in parallel

• Replication factor (RF): How many copies of your data?

• RF = 3 here

Each node stores 3/4 of clusters total data.

Page 12: Devops kc

Cassandra - Geographically Distributed

• Client writes local

• Data syncs across WAN

• Replication Factor per DC

Single coordinator

Page 13: Devops kc

Cassandra - Replication Factor

• Replication factor (RF): How many copies of your data?

• Replication Factor is set per keyspace

• Can be altered by operator

RF = 3

Page 14: Devops kc

Cassandra - Consistency

• Consistency Level (CL)

• Client specifies per read or write

• ALL = All replicas ack

• QUORUM = > 51% of replicas ack

• LOCAL_QUORUM = > 51% in local DC ack

• ONE = Only one replica acks

Page 15: Devops kc

Cassandra - Transparent to the application

• A single node failure shouldn’t bring failure

• Replication Factor + Consistency Level = Success

• This example:

• RF = 3

• CL = QUORUM

>51% Ack so we are good!

Page 16: Devops kc

Cassandra - Scaling

• Take a cluster of four nodes

• Where does the fifth node go?

• Rebalancing is costly 75

0

25

50

Page 17: Devops kc
Page 18: Devops kc
Page 19: Devops kc

Gossip• Manages cluster state

• Nodes up/down

• Nodes joining/leaving

• Decentralized

• “Heartbeat” every second

• Every node contacts 1-3 other nodes

Page 20: Devops kc

Snitch

• Responsible for determining cluster topology

• Datacenter awareness

• Tracks node responsiveness

• Many snitches provided out of the box

• SimpleSnitch

• GossipingPropertyFileSnitch (recommended for production)

• EC2Snitch and EC2MultiRegionSnitch

• For use with AWS

• Comparable GCE snitch has just been added

• Custom snitches can be added

Page 21: Devops kc

Anti-Entropy - Read Repair

Page 22: Devops kc

Anti-Entropy - Hinted Handoff

• Three hour window

• Hints are replayed when node is restored

• Stored in system.hints table on coordinator

• Cassandra does not copy Dynamo’s “sloppy quorum”

Page 23: Devops kc

Anti-Entropy - Repair

• Nodetool repair

• Uses merkle trees for data comparison

• Should be run weekly.

• Cassandra 2.1 has drastically improved repair times, thanks to incremental repair

Page 24: Devops kc

©2012 DataStax

Node Architecture

Page 25: Devops kc

Write Path

commit log

Memtable

SSTable

Write

Memory

Disk

Page 26: Devops kc

Write Path• By default data is fsynced every 10s

• This can be configured in cassandra.yaml

commit log

Memtable

SSTable

Write

Page 27: Devops kc

Read Path

Memtable

SSTable

Read

SSTable

Memory

Disk

Page 28: Devops kc

Read Path

Page 29: Devops kc

Compaction

Page 30: Devops kc

Compaction

Page 31: Devops kc

Debugging your data model• Tracing

cqlsh> tracing on;Now tracing requests.

cqlsh:foo> INSERT INTO test (a, b) VALUES (1, 'example');Tracing session: 4ad36250-1eb4-11e2-0000-fe8ebeead9f9

activity | timestamp | source | source_elapsed-------------------------------------+--------------+-----------+---------------- execute_cql3_query | 00:02:37,015 | 127.0.0.1 | 0 Parsing statement | 00:02:37,015 | 127.0.0.1 | 81 Preparing statement | 00:02:37,015 | 127.0.0.1 | 273 Determining replicas for mutation | 00:02:37,015 | 127.0.0.1 | 540 Sending message to /127.0.0.2 | 00:02:37,015 | 127.0.0.1 | 779

Messsage received from /127.0.0.1 | 00:02:37,016 | 127.0.0.2 | 63 Applying mutation | 00:02:37,016 | 127.0.0.2 | 220 Acquiring switchLock | 00:02:37,016 | 127.0.0.2 | 250 Appending to commitlog | 00:02:37,016 | 127.0.0.2 | 277 Adding to memtable | 00:02:37,016 | 127.0.0.2 | 378 Enqueuing response to /127.0.0.1 | 00:02:37,016 | 127.0.0.2 | 710 Sending message to /127.0.0.1 | 00:02:37,016 | 127.0.0.2 | 888

Messsage received from /127.0.0.2 | 00:02:37,017 | 127.0.0.1 | 2334 Processing response from /127.0.0.2 | 00:02:37,017 | 127.0.0.1 | 2550 Request complete | 00:02:37,017 | 127.0.0.1 | 2581

Page 32: Devops kc

Nodetool

• Command line interface for monitoring Cassandra and performing routine database operations

• Commands for viewing detailed metrics for tables, server metrics, and compaction statistics:• cfstats: statistics for each table and keyspace

• cfhistograms: statistics about a table, including read/write latency, row size, column count, and number of SSTables

• netstats: statistics about network operations and connections

• tpstats: statistics about the number of active, pending, and completed tasks for each stage of Cassandra operations by thread pool

Page 33: Devops kc

©2012 DataStax

Try it out

Page 34: Devops kc

Cassandra• Download from source:

• git clone git://git.apache.org/cassandra.git

• Packaged install and tarballs available:• http://www.datastax.com/documentation/cassandra/2.1/cassandra/install/ins

tall_cassandraTOC.html

Page 35: Devops kc

CCM• CCM - Cassandra Cluster Manager

• https://github.com/pcmanus/ccm

• Warning: not lightweight

• Example:• ccm create test -v 2.0.1

• ccm populate -n 3

• ccm start

Page 36: Devops kc

Clients• Cqlsh

• Bundled with Cassandra

• Drivers• java: https://github.com/datastax/java-driver

• python: https://github.com/datastax/python-driver

• .net: https://github.com/datastax/csharp-driver

• and more: http://www.datastax.com/download/clientdrivers

• Ruby, C/C++, NodeJS

Page 37: Devops kc

Get Help

• IRC: #cassandra on freenode

• Mailing Lists

• Subscribe at cassandra.apache.org

• Stack Overflow

• DataStax Docs

• http://www.datastax.com/docs

Page 38: Devops kc

©2012 DataStax

Questions?

Page 39: Devops kc

©2014 DataStax Confidential. Do not distribute without consent.©2014 DataStax Confidential. Do not distribute without consent.